Skip to content

Genome-Wide Association Studies

Updated: 16 June 2023

This course is an introduction to the method of Genome-Wide Association Studies (GWAS), which quantifies the statistical association between a genetic variant and a phenotype (often on disease traits). This course will not focus on using any particular software, instead explaining why the given analyses are done from a statistical and biological perspective.


Samuele Soraggi Conor O'Hare


💬 Syllabus:
1. Understand what is a GWAS is and why we use it
2. Statistics of GWAS (regression coefficients, P-values, statistical power, Bayes factors)
3. Genetic relatedness and population structure
4. Confounding and covariates in GWAS
5. Haplotypes, linkage disequilibrum, imputation, fine-mapping
6. Linear mixed models and heritability
7. Summary statistics and meta-analysis
8. Advanced tools

🕰 Total Time Estimation: 6 hours

📁 Supporting Materials: Original course from the University of Helsinki | PLINK documentation | An Introduction to Statistical Learning (for further statistics explanations)

📋 License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License

Course Requirements

  • Knowledge of R. It is recommended that you have at least followed our workshop From Excel to R
  • Basic statistics and mathematics skills

We do GWAS because a statistical association between a particular physical region of the genome and the phenotype

  • can point to biological mechanisms affecting the phenotype,
  • can allow prediction of the phenotype from genomic information.

These results may further benefit

  • medicine by leading to molecular or environmental interventions against harmful phenotypes,
  • biotechnology by improving the ways we utilize microbes, plants or animals,
  • forensics by more accurate identification of an individual from a DNA sample,
  • biogeographic ancestry inference of individuals, populations and species,
  • our understanding of the role of natural selection and other evolutionary forces in the living world.

The genome of an individual remain (nearly) constant throughout the individual’s lifetime. This is a truly remarkable property compared to, e.g., other molecular sources of information (such as metabolomics, metagenomics, transcriptomics, proteomics or epigenomics) or environmental factors that may vary widely across time. Therefore, the genome seems an ideal starting point for scientific research: it needs to be measured only once for an individual and there is no reverse causation from the phenotype to genome (with cancer as an important exception).