Introduction to Population Genomics¶
A course of the danish health data science sandbox
This course is based on the material developed for the Population Genomics course at Aarhus university. The material is organized in four separated jupyter notebooks in both R
, bash
, and python
, where you will benefit of an interactive coding setup.
If you use any of this material for your research, please cite this course with the DOI below, and acknowledge the Health Data Science Sandbox project of the Novo Nordisk Foundation (grant number NNF20OC0063268). It is of great help to support the project and the creation of new courses.
Course description¶
The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.
Prerequisites¶
This is an introductory course that needs a basic understanding of genomics, and not necessarily programming experience (thought that helps).
Learning Outcomes¶
After the course, you will have detailed knowledge of the methods and applications required to perform a typical population genomic study. You will be able to:
- Identify an experimental platform relevant to a population genomic analysis.
- Apply commonly used population genomic methods.
- Explain the theory behind common population genomic methods.
- Reflect on strengths and limitations of population genomic methods.
- Interpret and analyze results of population genomic inference.
- Formulate population genetics hypotheses based on data
Supporting material¶
- jupyter notebooks for interactive coding
- Structure of the course with lecture list
The curriculum for each week of the course is listed below. "Coop" refers to a set of lecture notes by Graham Coop that are used throughout the course.
Course duration and structure¶
This course is one-semester long.
- Course intro and overview:
- Lecture (Kasper): Coop chapters 1, 2, 3, Paper: Genome Diversity Project
- Exercise: Cluster practicals
- Drift and the coalescent:
- Lecture: Coop chapter 4; Paper: Platypus
- Exercise: Read mapping and base calling
- Recombination:
- Population strucure and incomplete lineage sorting:
- Lecture: Coop chapter 6, Review: Incomplete lineage sorting
- Exercise: Working with VCF files
- Hidden Markov models:
- Lecture : Durbin chapter 3, Paper: population structure
- Exercise: Inference of population structure and admixture
- Ancestral recombination graphs:
- Lecture: Paper: Approximating the ARG, Paper: Tree inference
- Exercise: ARG dashboard exercises + Inference of trees along sequence
- Past population demography:
- Lecture (Juraj): Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference
- Exercise: Inferring historical populations
- Direct and linked selection:
- Lecture: Coop chapters 12, 13, revisit Paper: Tree inference
- Admixture:
- Lecture: Review: Admixture, Paper: Admixture inference
- Exercise: Detecting archaic ancestry in modern humans
- Genome-wide association study (GWAS):
- Lecture: Coop lecture notes 99-120
- Exercise: GWAS quality control
- Heritability:
- Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
- Exercise: Association testing
- Evolution and disease:
- Lecture : Coop Lecture notes Sec. 11.0.1 (p217-221)
- Exercise: Estimating heritability
Course authors¶
Head of the course: Kasper Munch.
Contact: Samuele Soraggi (samuele at birc.au.dk).