Skip to content


One of the main roles of the Sandbox is to develop tutorials and courses for those that want to develop their skills in bioinformatics-related disciplines. We regularly host physical courses and workshops to researchers and students, but below you can find tutorials where the material has been adapted to accommodate a self-guided approach. Please choose your topic of interest and you will find available materials and a guide for their intended use. Feel free to adapt the materials for your own purposes (with credit to the National Health Data Science Sandbox project).



Genomics is the study of genomes, the complete set of an organism's DNA. Genomics research now encompasses functional and structural studies, epigenomics, and metagenomics, and genomic medicine is under active implementation and extension in the health sector.

Introduction to Next Generation Sequencing data


Transcriptomics is the study of transcriptomes, which investigates RNA transcripts within a cell or tissue to determine what genes are being expressed and in what proportion. These RNA transcripts include mRNAs, tRNA, rRNA and other non-coding RNA presents in a cell.

Bulk RNAseq (self-tutorial version of workshop held on 18-19 August 2022)

Single-Cell RNAseq (self-tutorial version of workshop with date TBA)


Proteomics is the study of proteins that are produced by an organism. Proteomics allows us to analyse protein compositon and structure, which have great importance in determining their function.

Modules linked to proteomics - including AlphaFold and Clinical Proteomics - are currently under development.

Electronic Health Records

Electronic health records (EHRs) are digital records kept in the public health sector that record the medical histories of individuals, and access is normally highly restricted to preserve patient privacy. This data is sometimes also shared (partly or in full) in secondary patient registries that support research of a specific disease or condition (such as cystic fibrosis). These datasets are extraordinarily valuable in the development of predictive models used in precision medicine.

Modules linked to EHR analysis are currently under development. An initial synthetic dataset was deployed for a course in the MSc in Personal Medicine program.