Functional Class Scoring

Author

You!

Published

December 20, 2024

Approximate time: 30 minutes

Learning Objectives:

  • Discuss functional class scoring
  • Construct a GSEA analysis using GO and KEGG gene sets
  • Examine results of a GSEA using pathview package

Gene set enrichment analysis using clusterProfiler and Pathview

Using the log2 fold changes obtained from the differential expression analysis for every gene, gene set enrichment analysis and pathway analysis can be performed using clusterProfiler and Pathview tools.

For a gene set or pathway analysis using clusterProfiler, coordinated differential expression over gene sets is tested instead of changes of individual genes. “Gene sets are pre-defined groups of genes, which are functionally related. Commonly used gene sets include those derived from KEGG pathways, Gene Ontology terms, MSigDB, Reactome, or gene groups that share some other functional annotations, etc. Consistent perturbations over such gene sets frequently suggest mechanistic changes”.

Preparation for GSEA

clusterProfiler offers several functions to perform GSEA using different genes sets, including but not limited to GO, KEGG, and MSigDb. We will use the KEGG gene sets, which identify genes using their Entrez IDs. Therefore, to perform the analysis, we will need to acquire the Entrez IDs. We will also need to remove the Entrez ID NA values and duplicates (due to gene ID conversion) prior to the analysis:

Code
## Remove any NA values (reduces the data by quite a bit) and duplicates

res_entrez <- dplyr::filter(res_ids, entrez != "NA" & duplicated(entrez)==F)

Finally, extract and name the fold changes:

Code
## Extract the foldchanges
foldchanges <- res_entrez$log2FoldChange

## Name each fold change with the corresponding Entrez ID
names(foldchanges) <- res_entrez$entrez

Next we need to order the fold changes in decreasing order. To do this we’ll use the sort() function, which takes a vector as input. This is in contrast to Tidyverse’s arrange(), which requires a data frame.

Code
## Sort fold changes in decreasing order
foldchanges <- sort(foldchanges, decreasing = TRUE)

head(foldchanges)
     1297      1501 100996732 107985200      3263      3934 
 4.182128  3.467430  3.280630  3.280630  3.001757  2.922119 

Performing GSEA

To perform the GSEA using KEGG gene sets with clusterProfiler, we can use the gseKEGG() function:

Code
## GSEA using gene sets from KEGG pathways
gseaKEGG <- gseKEGG(geneList = foldchanges, # ordered named vector of fold changes (Entrez IDs are the associated names)
              organism = "hsa", # supported organisms listed below
              pvalueCutoff = 0.05, # padj cutoff value
              verbose = FALSE)

## Extract the GSEA results
gseaKEGG_results <- gseaKEGG@result
head(gseaKEGG_results)
               ID                                       Description setSize
hsa03040 hsa03040                                       Spliceosome     140
hsa05322 hsa05322                      Systemic lupus erythematosus     122
hsa04613 hsa04613           Neutrophil extracellular trap formation     174
hsa05014 hsa05014                     Amyotrophic lateral sclerosis     307
hsa05022 hsa05022 Pathways of neurodegeneration - multiple diseases     420
hsa05016 hsa05016                                Huntington disease     267
         enrichmentScore       NES       pvalue     p.adjust       qvalue rank
hsa03040       0.7191922  2.258903 1.000000e-10 0.0000000345 3.042105e-08 1565
hsa05322      -0.6047195 -1.944187 8.910649e-06 0.0015370869 1.355357e-03 1326
hsa04613      -0.5314809 -1.784992 3.703480e-05 0.0042590018 3.755459e-03 1394
hsa05014       0.4505410  1.556592 1.004386e-04 0.0070771589 6.240424e-03 4922
hsa05022       0.4268181  1.516054 1.025675e-04 0.0070771589 6.240424e-03 4202
hsa05016       0.4430374  1.514514 3.211607e-04 0.0184667415 1.628341e-02 4918
                           leading_edge
hsa03040  tags=20%, list=8%, signal=19%
hsa05322  tags=21%, list=6%, signal=20%
hsa04613  tags=19%, list=7%, signal=18%
hsa05014 tags=43%, list=24%, signal=33%
hsa05022 tags=37%, list=20%, signal=30%
hsa05016 tags=46%, list=24%, signal=36%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                core_enrichment
hsa03040                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  124904135/124904136/124904137/124904138/124904139/124904140/124904141/124904142/124904143/124904144/124907963/124907964/124907965/10262/6634/6631/9939/220988/8683/9129/151903/3312/2521/84321/24148/4809/57819/84991
hsa05322                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  8348/8353/3586/8347/440689/8970/92815/7124/8358/8336/942/1991/718/8343/8357/723790/8341/8349/128312/8329/8344/940/8360/3017/8367/8370
hsa04613                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           3689/51284/8348/1183/8353/8347/7450/440689/8970/92815/6403/8358/1432/8336/2243/2266/1991/718/8343/8357/5330/723790/23569/8341/8349/128312/3683/8329/8344/8360/3017/8367/8370
hsa05014                                                                                                   2902/1769/598/4541/25981/55567/4539/4519/2878/60/5608/11047/4747/7415/9776/572/22863/5532/4726/84516/10010/220988/5709/4720/4722/581/2521/10482/4702/5903/5715/26100/311/5708/4514/5701/81027/203068/2475/5707/5692/81929/3178/4725/5702/79902/5705/10762/5704/79139/4707/55916/5071/4717/5687/6647/5691/4686/8408/4718/8678/55860/9861/836/4723/317/9377/5606/64837/126328/4728/11258/3799/10671/515/5714/4715/10121/516/4713/9217/514/5684/283/4696/29982/1329/4716/842/1349/5603/129401/5706/6396/100101267/10376/5689/9972/4710/201625/513/518/4694/8480/29979/55746/23435/5683/5682/1337/642659/10476/5718/596/9688/1537/55967/23064/51079/9818/3800/57122/23165/506/2876/7186/2891/1965/1351/10718/51807/25978
hsa05022 2902/1769/598/4541/25981/81029/55567/1536/4539/4519/5970/2878/11211/50507/2776/5608/11047/4747/7415/7345/2932/22943/9776/572/7332/5664/22863/4893/1020/5532/818/4726/356/84516/8312/10010/120892/10105/5709/4720/4722/7326/3845/3552/581/2521/355/5599/808/3569/3265/5578/4702/7473/23236/578/1857/5609/5715/26100/5708/488/4514/5701/8851/81027/203068/2475/5707/11315/5692/27035/4725/5702/8322/1855/5705/5602/7477/3553/10131/324/5704/1131/79139/1139/4707/5071/4717/5687/6647/5691/8408/4718/54361/8678/55860/9861/7311/8326/836/3064/4723/317/9377/805/5606/9927/64837/126328/4728/11258/3799/10671/515/5714/7419/4715/10121/51465/516/4713/9217/514/5684/1459/4696/29982/5335/292/369/1329/4716/842/1349/5603/5706/7327/10376/5689/27429/4710/201625/1460/5481/513/518/4694/10297/776/23435/5663/5683/5682/1337
hsa05016                                                                                                                                                                  2902/1769/4541/25981/55567/4539/4519/2878/2776/11047/9776/5439/22863/5434/4726/84516/10105/5709/4720/4722/5438/581/548644/5599/3065/4702/23236/5609/5715/26100/5708/10488/4514/5701/81027/203068/2475/5707/7019/5692/4725/5702/5705/5602/1175/5704/1211/4707/4717/5687/6647/5691/8408/4718/1173/8678/55860/9861/836/3064/4723/317/9377/64837/126328/4728/11258/3799/10671/515/5714/7419/4715/10121/516/4713/514/5684/4899/4696/29982/292/1329/4716/842/5432/1349/5706/10376/5689/6908/4710/201625/1212/5433/513/5431/518/4694/161/5683/160/1387/5682/6648/1337/5468/10476/5718/1537/5437/55967/51079/3800/506/2876/7186/6874/5440/2891/1351/51807/841

NOTE: The organisms with KEGG pathway information are listed here.

How many pathways are enriched?

View the enriched pathways:

Code
## Write GSEA results to file
write.csv(gseaKEGG_results, "../Results/gsea_Cont-Vamp_kegg.csv", quote=F)

NOTE: We will all get different results for the GSEA because the permutations performed use random reordering. If we would like to use the same permutations every time we run a function (i.e. we would like the same results every time we run the function), then we could use the set.seed(123456) function prior to running. The input to set.seed() could be any number, but if you would want the same results, then you would need to use the same number as input.

Explore the GSEA plot of enrichment of one of the pathways in the ranked list:

Code
## Plot the GSEA plot for a single enriched pathway:
gseaplot(gseaKEGG, geneSetID = gseaKEGG_results$ID[1], title = gseaKEGG_results$Description[1])

In this plot, the lines in plot represent the genes in the gene set, and where they occur among the log2 fold changes. The largest positive log2 fold changes are on the left-hand side of the plot, while the largest negative log2 fold changes are on the right. The top plot shows the magnitude of the log2 fold changes for each gene, while the bottom plot shows the running sum, with the enrichment score peaking at the red dotted line (which is among the negative log2 fold changes).

Use the Pathview R package to integrate the KEGG pathway data from clusterProfiler into pathway images:

Code
## Output images for a single significant KEGG pathway
pathview(gene.data = foldchanges,
              pathway.id = gseaKEGG_results$ID[1],
              species = "hsa",
              limit = list(gene = 2, # value gives the max/min limit for foldchanges
              cpd = 1))

NOTE: Printing out Pathview images for all significant pathways can be easily performed as follows:

Code
## Output images for all significant KEGG pathways
get_kegg_plots <- function(x) {
   pathview(gene.data = foldchanges, 
            pathway.id = gseaKEGG_results$ID[x], 
            species = "hsa",
            limit = list(gene = 2, cpd = 1))
}

purrr::map(1:length(gseaKEGG_results$ID), 
           get_kegg_plots)

Instead of exploring enrichment of KEGG gene sets, we can also explore the enrichment of BP Gene Ontology terms using gene set enrichment analysis:

Code
# GSEA using gene sets associated with BP Gene Ontology terms
gseaGO <- gseGO(geneList = foldchanges, 
              OrgDb = org.Hs.eg.db, 
              ont = 'BP', 
              minGSSize = 20, 
              pvalueCutoff = 0.05,
              verbose = FALSE) 

gseaGO_results <- gseaGO@result
head(gseaGO_results)
                   ID                          Description setSize
GO:0022613 GO:0022613 ribonucleoprotein complex biogenesis     394
GO:0042254 GO:0042254                  ribosome biogenesis     258
GO:0006334 GO:0006334                  nucleosome assembly     101
GO:0007005 GO:0007005           mitochondrion organization     391
GO:0002385 GO:0002385              mucosal immune response      35
GO:0006364 GO:0006364                      rRNA processing     182
           enrichmentScore       NES       pvalue     p.adjust       qvalue
GO:0022613       0.4821523  1.715050 7.090003e-08 0.0002854435 0.0002636735
GO:0042254       0.5016453  1.728272 7.556806e-06 0.0152118514 0.0140516828
GO:0006334      -0.6352465 -1.969317 1.447799e-05 0.0194294660 0.0179476308
GO:0007005       0.4365947  1.552053 3.145319e-05 0.0316576325 0.0292431865
GO:0002385      -0.7660692 -1.976019 5.813500e-05 0.0365246263 0.0337389873
GO:0006364       0.5061339  1.676144 7.257750e-05 0.0365246263 0.0337389873
           rank                   leading_edge
GO:0022613 4529 tags=43%, list=22%, signal=34%
GO:0042254 4529 tags=47%, list=22%, signal=37%
GO:0006334 1326  tags=21%, list=6%, signal=20%
GO:0007005 4346 tags=41%, list=21%, signal=33%
GO:0002385  933  tags=20%, list=5%, signal=19%
GO:0006364 5185 tags=51%, list=25%, signal=38%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                core_enrichment
GO:0022613 58155/27340/57647/10514/9136/84881/55702/115939/10262/84864/8161/388524/8850/6634/6631/708/25926/10078/7520/10978/26155/55153/8683/9129/25980/79979/23481/4839/5822/54555/8241/8886/51018/6895/26574/8661/8663/728689/24148/8451/4809/51504/29777/1653/57819/114049/54496/81875/387338/27341/5036/22803/51121/5394/55299/56915/51096/1736/3692/27043/23195/23246/51388/51010/2091/91893/5591/10600/11325/9169/84549/51491/7375/79631/26523/79707/4686/6637/10969/6635/84946/5303/55027/2068/3326/55272/81554/79066/55505/25879/79050/64282/4869/8450/9775/10412/84769/51550/54663/2197/121053/51119/6234/54433/10885/54913/92856/6209/25983/6636/79760/10856/130916/8669/90121/11218/6839/163859/8662/10658/134430/51001/26284/51631/51441/54955/57109/10992/51116/79033/6632/90459/55695/9669/83939/6203/6191/6838/54059/6426/55720/23451/8568/8666/51367/10728/7514/100101490/64794/10480/8665/56902/102157402/28985/1660/2971/54512/10200/9733/55178/11171/55003/23560/117246/50628/285855/54853/6232/79833/55226
GO:0042254                                                                                                                                                                                                                                                                                           27340/57647/10514/9136/84881/115939/84864/388524/8850/708/25926/7520/26155/55153/79979/23481/4839/5822/54555/8241/8886/51018/26574/8451/4809/51504/29777/114049/81875/387338/27341/5036/22803/51121/5394/55299/56915/51096/1736/3692/27043/23195/23246/51388/51010/2091/91893/5591/10600/84549/51491/79631/79707/10969/84946/5303/55027/2068/55272/81554/79066/55505/25879/79050/64282/4869/8450/9775/10412/84769/51550/54663/2197/51119/6234/54433/10885/54913/92856/6209/25983/130916/90121/6839/163859/134430/51001/26284/51441/54955/57109/51116/79033/90459/55695/9669/83939/6203/6191/6838/54059/55720/8568/51367/7514/64794/56902/102157402/2971/54512/10200/55178/55003/23560/117246/50628/285855/54853/6232/55226
GO:0006334                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            8348/8353/546/64754/8347/6046/8970/8358/8343/8357/23569/8341/8349/3008/3007/8344/8360/3006/3005/8367/8370
GO:0007005                                                                                          2852/598/4541/2931/80119/597/51499/84883/64756/55737/2932/116228/5566/7015/572/80224/51024/57190/10017/27166/120892/10105/5027/4720/1267/192111/6240/4722/51295/581/4318/4846/23682/4702/578/117145/1050/125170/25813/29108/2034/5339/84895/2395/28978/25994/11315/7019/10063/92609/4725/23369/7755/26515/23761/51287/51537/23277/9804/55210/84987/3313/84275/1634/56947/3479/64429/66008/4707/10059/5071/23400/4717/64423/84233/26517/4718/83982/8546/2189/790955/81892/8326/54832/55750/81554/28976/5830/2810/3082/23787/29928/55572/26521/79784/9927/79017/126328/7351/4728/401505/3799/8192/4082/142/4715/9026/139341/51204/4713/647087/55245/54332/219293/125988/3954/8834/91137/55186/131474/5819/3308/1352/4696/55187/57143/292/4716/1906/55744/6742/4170/57226/5516/22906/54974/137392/6548/27429/4710/80821/10989/513/207/137682/4694/284114/2729/84266/90624/8936/51571/6648/3329/54927/5468/79064/84461/5864/596
GO:0002385                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 8347/8970/11126/64127/8343/8349/8344
GO:0006364                                                                                                                                                                                                                                                                                                                                                                                                                                                     27340/57647/9136/84881/115939/8850/25926/79979/23481/4839/5822/54555/8886/51018/4809/51504/29777/114049/387338/27341/5036/22803/5394/55299/56915/51096/1736/3692/27043/23246/51010/2091/91893/5591/84549/79707/10969/5303/2068/55272/79066/55505/25879/79050/64282/9775/10412/54663/51119/6234/54433/10885/54913/92856/6209/25983/130916/90121/6839/163859/134430/51441/57109/79033/90459/55695/54059/55720/8568/51367/102157402/54512/10200/55178/55003/23560/117246/50628/285855/54853/6232/55226/27079/51013/55646/23404/84135/260294/10248/79922/91695/54680
Code
gseaplot(gseaGO, geneSetID = gseaGO_results$ID[1], title = gseaGO_results$Description[1])

There are other gene sets available for GSEA analysis in clusterProfiler (Disease Ontology, Reactome pathways, etc.) You can check out this link for more!


Exercise 1

Run a Disease Ontology (DO) GSEA analysis using the gseDO() function. NOTE the arguments are very similar to the previous examples.

  • Do you find anything interesting?

Exercise 2

Run an GSE on the results of the DEA for Garlicum vs Vampirium samples. Remember to use the annotated results!