GWAS

Authors

Conor O’Hare

Samuele Soraggi

Alba Refoyo Martinez

Modified

July 22, 2025

This course is an introduction to the method of Genome-Wide Association Studies (GWAS), which quantifies the statistical association between a genetic variant and a phenotype (often on disease traits). This course will not focus on using any particular software, instead explaining why the given analyses are done from a statistical and biological perspective.

Course Overview

📖 Syllabus:

Understand what is a GWAS is and why we use it
Statistics of GWAS (regression coefficients, P-values, statistical power, Bayes factors)
Genetic relatedness and population structure
Confounding and covariates in GWAS
Haplotypes, linkage disequilibrum, imputation, fine-mapping
Linear mixed models and heritability
Summary statistics and meta-analysis
Advanced tools

⏰ Total Time Estimation: 8 hours
📁 Supporting Materials:
👨‍💻 Target Audience: Ph.D., MSc, etc.
👩‍🎓 Level: Beginner.
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License

Course Requirements

Knowledge of R. It is recommended that you have at least followed our workshop From Excel to R
Basic knowledge of bash.
Basic statistics and mathematics skills

This workshop material includes a tutorial on how to run genome-wide association studies and the necessary preprocessing steps. Why are GWAS important?

They identify statistical associations between specific regions of the genome and a given phenotype which can:

help point to biological mechanisms affecting the phenotype,
allow prediction of the phenotype from genomic information.

These results may further benefit:

medicine by leading to molecular or environmental interventions against harmful phenotypes,
biotechnology by improving the ways we utilize microbes, plants or animals,
forensics by more accurate identification of an individual from a DNA sample,
biogeographic ancestry inference of individuals, populations and species,
our understanding of the role of natural selection and other evolutionary forces in the living world.

The genome of an individual remain (nearly) constant throughout the individual’s lifetime. This is a truly remarkable property compared to, e.g., other molecular sources of information (such as metabolomics, metagenomics, transcriptomics, proteomics or epigenomics) or environmental factors that may vary widely across time. Therefore, the genome seems an ideal starting point for scientific research: it needs to be measured only once for an individual and there is no reverse causation from the phenotype to genome (with cancer as an important exception).

Course Goals

By the end of this workshop, you should be able to:

Learn and explain fundamental population genetics concepts, applying them during data analysis.
Understand the principles of GWAS, including linkage disequilibrium and linear regression, and apply them in practice.
Develop skills to preprocess data and perform genotype imputation for missing values.
Explore, discuss, and replicate basic GWAS applications from the scientific literature.
Interpret GWAS results critically, recognizing their limitations.

Acknowledgements

Center for Health Data Science, University of Copenhagen
Matti Pirinen, PhD, University of Helsinki
Andries T. Marees, Vrije Universiteit Amsterdam

Course instructors

Alba Refoyo Martinez

Data Scientist, KU

Samuele Soraggi

Data Scientist, AU

Copyright

CC-BY-SA 4.0 license

Welcome to the GWAS tutorial

Acknowledgements

Course instructors

Alba Refoyo Martinez

Samuele Soraggi

Copyright