Skip to content

Introduction to Next Generation Sequencing data

A course of the danish healt data science sandbox

This course is based on the material developed for the NGS summer school at Aarhus University. The material is organized in four separated jupyter notebooks in both bash, python and R where you will benefit of an interactive coding setup.

If you use any of this material for your research, please cite this course with the DOI below, and acknowledge the Health Data Science Sandbox project of the Novo Nordisk Foundation (grant number NNF20OC0063268). It is of great help to support the project. DOI

Course description

After the course, you will have knowledge of bioinformatics methods for analyzing genomes using NGS data, including knowledge of the existing types of genome data, how the different types of data can be displayed and analyzed, the current methods for genome assembly and analysis, their accuracy and how they can be used. The course will enable you to devise and run a project that makes use of NGS data.


This is an introductory course that needs a basic understanding of the biology behind sequencing, and not necessarily programming experience.

Learning Outcomes

  • Describe key challenges in the analysis of NGS data
  • Explain the theoretical foundation for methods that use NGS for assembly and analysis of genomes
  • Discuss the bioinformatic methods for genome analysis and hypothesize what drives the outcome of the methods
  • Discuss original literature within the subjects and relate the discussed topics to analysis scenarios
  • Apply bioinformatics tools within the selected application areas and reflect on the results, formulating your own conclusion in the proposed tasks

Supporting material

  • jupyter notebooks for interactive coding
  • lecture slides from the instructor

You can find the links to the material in the table at the bottom of this page.

Course duration

This course was originally one-week long.

Course authors

Heads of the course: Mikkel H. Schierup, Stig U. Andersen.

Exercise responsibles: Lavinia I. Fechete, Jilong Ma, Samuele Soraggi.

Contact: Samuele Soraggi (samuele at

Course material

Here you find a table with the instructor's slides and a link to the compiled notebooks, that you can also run on your own following the instructions. Data alignment can also be performed on the Galaxy interactive webpage (see the manual in the table).

Topic Slide Notebook
Sequencing technologies link --
Mapping to reference link Notebook or Galaxy guide
Data visualization link --
SNPs and structural variants link Notebook
RNA sequencing link Notebook
De-novo assembly link --
Microbiomes and metagenomics link --
Single cell RNA sequencing link Notebook