7 Other Tools & Conclusion¶
7.1 LDAK¶
Throughout this tutorial, we have used PLINK as the main method of our GWAS analysis. This is, in large part, due to its popularity throughout the modern literature. But it is by no means the only method of GWAS.
One tool that has been developed in recent years is LDAK, a method proposed in 2012. One of the most sifnificant improvements of this programme is the genetic prediction of complex traits from individual-level data or summary statistics. Most prediction tools assume the GCTA Model, whereby each SNP is expected to contribute equally to the phenotype (as is the case with PLINK). But when we replace the GCTA Model with the BLD-LDAK Model, the squared correlation between observed and predicted phenotypes ($R^2$) increases by on average 14% (s.d. 1%).
7.1.1 Overview of functionality¶
Using LDAK is very similar to PLINK, though the commands in PLINK are a bit less regular. As with PLINK, you will require the .bed
, .bim
and .fam
files of your cohort. Beyond this, you will need additional files:
.info
- information scores for SNPs.pheno
- a phenotype.covar
- covariates.ind.hers
- estimates of per-SNP heritabilities.genefile
- (real) RefSeq human gene annotations
If we call our prefix human
, then very simply, we can compute summary statistics in the following way:
/work/Software/ldak --calc-stats Data/ldak_data/human --bfile Data/ldak_data/human
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- LDAK - Software for obtaining Linkage Disequilibrium Adjusted Kinships and Loads More Version 5.2 - Help pages at http://www.ldak.org -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- There are 2 pairs of arguments: --calc-stats Data/ldak_data/human --bfile Data/ldak_data/human -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Calculating predictor and individual statistics To run the parallel version of LDAK, use "--max-threads" (this will only reduce runtime for some commands) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Reading IDs for 424 samples from Data/ldak_data/human.fam Reading details for 3289 predictors from Data/ldak_data/human.bim Data contain 424 samples and 3289 predictors Calculating statistics for Chunk 1 of 1 Statistics saved in Data/ldak_data/human.stats and Data/ldak_data/human.missing -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Mission completed. All your basepair are belong to us :) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
This command asks LDAK to read the data stored in Binary PLINK format with prefix human, then save the results to files with prefix “human”.
Option pairs can be provided in any order, so it is equivalent to use
/work/Software/ldak --bfile Data/ldak_data/human --calc-stats Data/ldak_data/human
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- LDAK - Software for obtaining Linkage Disequilibrium Adjusted Kinships and Loads More Version 5.2 - Help pages at http://www.ldak.org -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- There are 2 pairs of arguments: --bfile Data/ldak_data/human --calc-stats Data/ldak_data/human -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Calculating predictor and individual statistics To run the parallel version of LDAK, use "--max-threads" (this will only reduce runtime for some commands) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Reading IDs for 424 samples from Data/ldak_data/human.fam Reading details for 3289 predictors from Data/ldak_data/human.bim Data contain 424 samples and 3289 predictors Calculating statistics for Chunk 1 of 1 Statistics saved in Data/ldak_data/human.stats and Data/ldak_data/human.missing -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Mission completed. All your basepair are belong to us :) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
The output files are called human.stats
and human.missing
.
Now by accessing the columns of human.stats
, we can create plots similar to those in Section 3 of the tutorial on QC:
Figure 7.1: Genomic properties are easily accessible by parsing through the outputted summary statistics
If you wish to perform single-SNP association analysis using linear regression
for this we use the main argument --linear
. Each main argument requires different options. Documentation is again provided at www.dougspeed.com, but typically the easiest way is to run LDAK using just the main argument, then it will tell you what options you require:
/work/Software/ldak --linear linear
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- LDAK - Software for obtaining Linkage Disequilibrium Adjusted Kinships and Loads More Version 5.2 - Help pages at http://www.ldak.org -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- There is one pair of arguments: --linear linear Error, you must use "--pheno" to provide phenotypes
The output can again be used to obtain GWAS plots of interest, such as a Manhattan plot:
Figure 7.2: Manhattan plot outputted by LDAK
Beyond this, there are many other methods and features LDAK provides, including the explicit inclusion of covariates and PRS. We suggest to those that are interested to follow the tutorials provided here. We have also provided the accompanying dataset to this tutorial under the Data
folder, intuitively called extra_data.zip
.
The key point is that although there are more advanced tools on the market, PLINK acts as a good introduction to performing singular and specific analyses for didactic purposes.
7.2 Further Reading¶
There is only so much one can discuss in a beginner's practical guide to GWAS. As such, for those that want to expand their knowledge of GWAS, we have provided a comprehensive list of resources for you to read/try out below:
Materials | Description |
---|---|
Other public courses | |
Statistics of GWAS | Semester-long course run by the University of Helsinki - more leans towards the mathematical theory behind GWAS (i.e. no practical) |
Introduction to GWAS | Similar to this course, but with more R implementation |
Post-GWAS | |
The Post-GWAS Era: From Association to Function | Good discussion that highlights key advances in the field of functional genomics that may facilitate the derivation of biological meaning post-GWAS |
Performing post-genome-wide association study analysis: overview, challenges and recommendations | More of a practical guide to the paper above |
Videos | |
Introduction to genomics theory | 30 minute discussion of using PLINK in the context of GWAS |
MPG Primer: GWAS design and interpretation | Medical and Population Genomics Primer from MIT |
Illumina Sequencing | Visually intuitive understanding of sequencing from Illumina |