7 Other Tools & Conclusion¶

7.1 LDAK¶

Throughout this tutorial, we have used PLINK as the main method of our GWAS analysis. This is, in large part, due to its popularity throughout the modern literature. But it is by no means the only method of GWAS.

One tool that has been developed in recent years is LDAK, a method proposed in 2012. One of the most sifnificant improvements of this programme is the genetic prediction of complex traits from individual-level data or summary statistics. Most prediction tools assume the GCTA Model, whereby each SNP is expected to contribute equally to the phenotype (as is the case with PLINK). But when we replace the GCTA Model with the BLD-LDAK Model, the squared correlation between observed and predicted phenotypes ($R^2$) increases by on average 14% (s.d. 1%).

7.1.1 Overview of functionality¶

Using LDAK is very similar to PLINK, though the commands in PLINK are a bit less regular. As with PLINK, you will require the .bed, .bim and .fam files of your cohort. Beyond this, you will need additional files:

.info - information scores for SNPs
.pheno - a phenotype
.covar - covariates
.ind.hers - estimates of per-SNP heritabilities
.genefile - (real) RefSeq human gene annotations

If we call our prefix human, then very simply, we can compute summary statistics in the following way:

In [1]:

                
                    Copied!
                    
/work/Software/ldak --calc-stats Data/ldak_data/human --bfile Data/ldak_data/human
/work/Software/ldak --calc-stats Data/ldak_data/human --bfile Data/ldak_data/human

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
LDAK - Software for obtaining Linkage Disequilibrium Adjusted Kinships and Loads More
Version 5.2 - Help pages at http://www.ldak.org
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

There are 2 pairs of arguments:
--calc-stats Data/ldak_data/human
--bfile Data/ldak_data/human

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Calculating predictor and individual statistics

To run the parallel version of LDAK, use "--max-threads" (this will only reduce runtime for some commands)

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Reading IDs for 424 samples from Data/ldak_data/human.fam

Reading details for 3289 predictors from Data/ldak_data/human.bim

Data contain 424 samples and 3289 predictors

Calculating statistics for Chunk 1 of 1

Statistics saved in Data/ldak_data/human.stats and Data/ldak_data/human.missing

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Mission completed. All your basepair are belong to us :)
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

This command asks LDAK to read the data stored in Binary PLINK format with prefix human, then save the results to files with prefix “human”.

Option pairs can be provided in any order, so it is equivalent to use

In [2]:

                
                    Copied!
                    
/work/Software/ldak --bfile Data/ldak_data/human --calc-stats Data/ldak_data/human
/work/Software/ldak --bfile Data/ldak_data/human --calc-stats Data/ldak_data/human

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
LDAK - Software for obtaining Linkage Disequilibrium Adjusted Kinships and Loads More
Version 5.2 - Help pages at http://www.ldak.org
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

There are 2 pairs of arguments:
--bfile Data/ldak_data/human
--calc-stats Data/ldak_data/human

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Calculating predictor and individual statistics

To run the parallel version of LDAK, use "--max-threads" (this will only reduce runtime for some commands)

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

Reading IDs for 424 samples from Data/ldak_data/human.fam

Reading details for 3289 predictors from Data/ldak_data/human.bim

Data contain 424 samples and 3289 predictors

Calculating statistics for Chunk 1 of 1

Statistics saved in Data/ldak_data/human.stats and Data/ldak_data/human.missing

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Mission completed. All your basepair are belong to us :)
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

The output files are called human.stats and human.missing.

Now by accessing the columns of human.stats, we can create plots similar to those in Section 3 of the tutorial on QC:

Figure 7.1: Genomic properties are easily accessible by parsing through the outputted summary statistics

If you wish to perform single-SNP association analysis using linear regression for this we use the main argument --linear. Each main argument requires different options. Documentation is again provided at www.dougspeed.com, but typically the easiest way is to run LDAK using just the main argument, then it will tell you what options you require:

In [5]:

                
                    Copied!
                    
/work/Software/ldak --linear linear
/work/Software/ldak --linear linear

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
LDAK - Software for obtaining Linkage Disequilibrium Adjusted Kinships and Loads More
Version 5.2 - Help pages at http://www.ldak.org
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

There is one pair of arguments:
--linear linear

Error, you must use "--pheno" to provide phenotypes

The output can again be used to obtain GWAS plots of interest, such as a Manhattan plot:

Figure 7.2: Manhattan plot outputted by LDAK

Beyond this, there are many other methods and features LDAK provides, including the explicit inclusion of covariates and PRS. We suggest to those that are interested to follow the tutorials provided here. We have also provided the accompanying dataset to this tutorial under the Data folder, intuitively called extra_data.zip.

The key point is that although there are more advanced tools on the market, PLINK acts as a good introduction to performing singular and specific analyses for didactic purposes.

7.2 Further Reading¶

There is only so much one can discuss in a beginner's practical guide to GWAS. As such, for those that want to expand their knowledge of GWAS, we have provided a comprehensive list of resources for you to read/try out below:

Materials	Description
Other public courses
Statistics of GWAS	Semester-long course run by the University of Helsinki - more leans towards the mathematical theory behind GWAS (i.e. no practical)
Introduction to GWAS	Similar to this course, but with more R implementation
Post-GWAS
The Post-GWAS Era: From Association to Function	Good discussion that highlights key advances in the field of functional genomics that may facilitate the derivation of biological meaning post-GWAS
Performing post-genome-wide association study analysis: overview, challenges and recommendations	More of a practical guide to the paper above
Videos
Introduction to genomics theory	30 minute discussion of using PLINK in the context of GWAS
MPG Primer: GWAS design and interpretation	Medical and Population Genomics Primer from MIT
Illumina Sequencing	Visually intuitive understanding of sequencing from Illumina