Introduction to Next Generation Sequencing data¶
A course of the Summer School of Aarhus University¶
Computing and didactical support from the Danish Health Data Science Sandbox
The material for this course is organized in four separated jupyter notebooks in both bash
, python
and R
where you will benefit of an interactive coding setup on jupyterlab
.
Overview
After the course, you will have knowledge of bioinformatics methods for analyzing genomes using NGS data, including knowledge of the existing types of genome data, how the different types of data can be displayed and analyzed, the current methods for genome assembly and analysis, their accuracy and how they can be used. The course will enable you to devise and run a project that makes use of NGS data.
📚 Prerequisites: This is an introductory course that needs a basic understanding of the biology behind sequencing, and not necessarily programming experience (though this would help!).
💬 Syllabus:
1. Describe key challenges in the analysis of NGS data
2. Explain the theoretical foundation for methods that use NGS for assembly and analysis of genomes
3. Discuss the bioinformatic methods for genome analysis and hypothesize what drives the outcome of the methods
4. Review original literature within the subjects and relate the discussed topics to analysis scenarios
5. Apply bioinformatics tools within the selected application areas and reflect on the results, formulating your own conclusion in the proposed tasks
🕰 Total Time Estimation: 20 hours
📁 Supporting Materials:
- jupyter notebooks for interactive coding
- lecture slides from the instructor
You can find the links to the material in the table at the bottom of this page.
🖍 Course authors and instructors:
Mikkel H Schierup Stig U Andersen
Samuele Soraggi Peter S Porsborg Adrián G Repollés
📋 License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License
📝 Citation: If you use any of this material for your research, please cite this course with the DOI below, and acknowledge the Health Data Science Sandbox project of the Novo Nordisk Foundation (grant number NNF20OC0063268). It is of great help to support the project.
📧 Contact: Samuele Soraggi (samuele at birc.au.dk).
Course structure and Instructions
The course exercises are organized in four exercise modules.
The first one is executed with the web interface usegalaxy.org. Click on 1.Galaxy Exercise
in the menu for instructions)
Afterwards, we will work on a computing environment to use jupyterlab
. Use 2.Instructions
for instructions.
The menu 3.Course exercises
contains all the compiled exercises as a reference.
Course material 2022¶
Here you find a table with the instructor's slides from 2022.
Topic | Slide | Notebook |
---|---|---|
Sequencing technologies | link | -- |
Mapping to reference | link | Notebook |
Data visualization | link | -- |
SNPs and structural variants | link | Notebook |
RNA sequencing | link | Notebook |
De-novo assembly | link | -- |
Microbiomes and metagenomics | link | -- |
Single cell RNA sequencing | link | Notebook |
Course material 2023 (on its way after the course's end)¶
Here you find a table with the instructor's slides and a link to the compiled notebooks, that you can also run on your own following the instructions
in this webpage. Data alignment can also be performed on the Galaxy
interactive webpage (see the galaxy exercise
in this webpage).