mkdir -p Results/GWAS7
# Create two links to data and software
ln -sf ../Data
Polygenic scores II
PRSice analysis II
We will be working with a new preprocessed simulated dataset that has already undergone quality control. Our analysis includes summary statistics from a powerful base GWAS (in this case, for height) and a target dataset consisting of European individuals in PLINK format. In this tutorial, we will incorporate covariates and principal components (PCs) in the polygenic score calculation.
Let’s create a folder for the output files.
You have already run the PRSice software for binary traits. Now, it’s your turn to do the same for height. What type of phenotype is this?
The data:
- Height.QC.gz: post-QC summary statistics
- EUR.QC.: prefix of plink files for the target sample
- EUR.height: file containing measurements
- EUR.covariate: this file contains the principal components and sex as covariates. Since PRSice only accepts a single covariate file, you may need to merge the .cov and .eigenvec files if you used PLINK for quality control.
Please, apply the following filter to the base GWAS:
- Filter out SNPs with MAF < 0.01 in the GWAS summary statistics, using the information in the
MAF
column - Filter out SNPs with INFO < 0.8 in the GWAS summary statistics, using the information in the
INFO
column
Adjust the code from the previous notebooks to run PRSice software on the new dataset. Check out the user manual if you need extra help: https://choishingwan.github.io/PRSice/.
We recommend using the qqman
library in R to visualize the Manhattan plot and QQ-plot of the base GWAS results to assess the distribution of association signals before computing the PRS.
# Write R code here
# Setup to avoid long messages and plot on screen
options(warn=-1)
options(jupyter.plot_mimetypes = 'image/png')
# Load GWAS package qqman
suppressMessages(library("qqman"))
# Write PRSice command here
Once you have the PRS results, answer the following questions:
- Which P-value threshold generated the “best-fit” PRS?
- How much phenotypic variation does the “best-fit” PRS explain?
Hint: Check the <PREFIX>.summary
file.
# Write your answer here
Since height differs across sexes, let’s focus on visualizing the relationship between the “best-fit” PRS and the phenotype of interest, colored according to sex.
# Write your code for plotting here
Do you want to explore other post-GWAS analyses? Visit this GitHub repository for a step-by-step guide on eMAGMA, a framework that converts GWAS summary statistics into gene-level statistics by assigning risk variants to putative genes using tissue-specific eQTL information.