The Health Data Science Sandbox is a national project coordinated by the Center for Health Data Science at the University of Copenhagen. Advisors and project data scientists are located at five Danish universities. We are building a data science sandbox for students and researchers that contains non-person-sensitive datasets spanning key health data domains – electronic health records, omics data such as genomics and transcriptomics, images, and wearable device data. Datasets are sourced from public databases or generated via privacy-preserving approaches to synthetic health data. We are building modules that pair topical datasets with recommended analysis tools, pipelines, and learning materials/tutorials in a portable, containerized format.

Our initial aim is to support university courses and programs in health data science and personal medicine, with broader environment access for researchers and university students planned in the future. Our sandbox for exploring health data science techniques will allow low-stakes guided learning and development followed by a smooth transition to a secure environment where users’ knowledge and tools can be applied to sensitive data. The sandbox environment is hosted on Danish supercomputers providing compute power while modules are publicly accessible on GitHub.

We thank the Novo Nordisk Foundation for funding support via the Data Science Research Infrastructure initiative.