Skip to content

Introduction to Next Generation Sequencing data

A course of the Summer School of Aarhus University

Computing and didactical support from the Danish Health Data Science Sandbox

The material for this course is organized in four separated jupyter notebooks in both bash, python and R where you will benefit of an interactive coding setup on jupyterlab.


After the course, you will have knowledge of bioinformatics methods for analyzing genomes using NGS data, including knowledge of the existing types of genome data, how the different types of data can be displayed and analyzed, the current methods for genome assembly and analysis, their accuracy and how they can be used. The course will enable you to devise and run a project that makes use of NGS data.

📚 Prerequisites: This is an introductory course that needs a basic understanding of the biology behind sequencing, and not necessarily programming experience (though this would help!).

💬 Syllabus:
1. Describe key challenges in the analysis of NGS data
2. Explain the theoretical foundation for methods that use NGS for assembly and analysis of genomes
3. Discuss the bioinformatic methods for genome analysis and hypothesize what drives the outcome of the methods
4. Review original literature within the subjects and relate the discussed topics to analysis scenarios
5. Apply bioinformatics tools within the selected application areas and reflect on the results, formulating your own conclusion in the proposed tasks

🕰 Total Time Estimation: 20 hours

📁 Supporting Materials:

  • jupyter notebooks for interactive coding
  • lecture slides from the instructor

You can find the links to the material in the table at the bottom of this page.

🖍 Course authors and instructors:

Mikkel H Schierup Stig U Andersen

Samuele Soraggi Peter S Porsborg Adrián G Repollés

📋 License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License

📝 Citation: If you use any of this material for your research, please cite this course with the DOI below, and acknowledge the Health Data Science Sandbox project of the Novo Nordisk Foundation (grant number NNF20OC0063268). It is of great help to support the project. DOI

📧 Contact: Samuele Soraggi (samuele at

Course structure and Instructions

The course exercises are organized in four exercise modules.

The first one is executed with the web interface Click on 1.Galaxy Exercise in the menu for instructions)

Afterwards, we will work on a computing environment to use jupyterlab. Use 2.Instructions for instructions.

The menu 3.Course exercises contains all the compiled exercises as a reference.

Course material 2022

Here you find a table with the instructor's slides from 2022.

Topic Slide Notebook
Sequencing technologies link --
Mapping to reference link Notebook
Data visualization link --
SNPs and structural variants link Notebook
RNA sequencing link Notebook
De-novo assembly link --
Microbiomes and metagenomics link --
Single cell RNA sequencing link Notebook

Course material 2023 (on its way after the course's end)

Here you find a table with the instructor's slides and a link to the compiled notebooks, that you can also run on your own following the instructions in this webpage. Data alignment can also be performed on the Galaxy interactive webpage (see the galaxy exercise in this webpage).