Cross-data analysis¶
Motivation:
Azoospermia is a condition where men do not produce any spermatozoa or produce semen of too low quality for allowing pregnancy to actually happen. Various types of azoospermia can happen, and those will look differently at a cellular level, as you can see below.
Examples of testicular histology and the composition of testicular cell types that can be observed among men with non-obstructive azoospermia. a A biopsy from a patient with Klinefelter syndrome (47, XXY) showing degenerated ghost tubules (#), tubules with Sertoli-cell-only (SCO) pattern () and large clusters of Leydig cells. b SCO () observed in a patient with a complete AZFc deletion. c Tubules with germ cell neoplasia in situ, GCNIS, which do not contain any normal germ cells (&). GCNIS cells are the precursor cells of testicular germ cell cancer and are found more frequently among men with azoospermia than among men with good semen quality (Hoei-Hansen et al. 2003). d Classical Sertoli-cell-only syndrome (SCOS) where no germ cells are present. Only Sertoli cells are found inside the seminiferous tubules marked with an asterisk (). e SCO () with partial hyalinisation of tubules (#). f Spermatocytic arrest (SPA) (§) at the stage of spermatocytes. The bar represents 100 microns and all images are in the same magnification. From (Soraggi et al 2020).
Common to the various azoospermic conditions is the lack or distuption of gene expression patterns. It makes therefore sense to detect genes expressed more in the healthy dataset against the azoospermic one. We can also investigate gene enrichment databases to get a clearer picture of what the genes of interest are relevant to.
We try to do a simple analysis of the dataset with "healthy" cells against a dataset with azoospermic cells: we integrate the data, apply differential expression and gene enrichment analysis. The azoospermic dataset has been already preprocessed and clustered. Notebooks for the whole process to elaborate the data are included under the section Extra
of the course webpage, and can be found in the folder Notebooks/Python/Azoospermia
. The original data is also provided, so you can as well play around on your own to preprocess and cluster again the data.
Learning objectives:
- Integrate datasets and detect DE genes in two different health conditions
- Evaluate visually the integration results
- Perform and interpret gene enrichment analysis
Execution time: 30 minutes
*Import packages*
import scanpy as sc
import pandas as pd
import scvelo as scv
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn
import anndata as ad
import gseapy as gp
plt.rcParams['figure.figsize']=(6,6) #rescale figures
import rpy2.robjects as ro
import rpy2.rinterface_lib.callbacks
import logging
from rpy2.robjects import pandas2ri
import anndata2ri
# Ignore R warning messages
#Note: this can be commented out to get more verbose R output
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR)
# Automatically convert rpy2 outputs to pandas dataframes
pandas2ri.activate()
anndata2ri.activate()
%load_ext rpy2.ipython
%%R
.libPaths( c( "../../../../sandbox_scRNA_testAndFeedback/scrna-environment/lib/R/library/" ) )
Read the data for healthy and azoospermic patient
healthy = sc.read('../../Data/notebooks_data/sample_123.filt.norm.red.clst.2.times.h5ad')
azoospermic = sc.read('../../../../sandbox_scRNA_testAndFeedback/scRNASeq_course/Data/notebooks_data/crypto_123.filt.norm.red.clst.2.times.h5ad')
WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.times', '.h5ad']. Only considering the two last: ['.times', '.h5ad']. WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.times', '.h5ad']. Only considering the two last: ['.times', '.h5ad']. WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.times', '.h5ad']. Only considering the two last: ['.times', '.h5ad']. WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.times', '.h5ad']. Only considering the two last: ['.times', '.h5ad'].
Rename cluster variable to match the two datasets
azoospermic.obs['clusters_som']=azoospermic.obs['clusters_spc'].copy()
Just a reminder of available clusters and UMAP plot. In this case we have matching clusters apart from SpermatogoniaB - whose markers were not observed in azoospermic patients. Note also differences in pseudotimes.
sc.pl.umap(healthy, color=['clusters_spc','pseudotimes'],
legend_loc='on data', title='Healthy patient clustering')
WARNING: The title list is shorter than the number of panels. Using 'color' value instead for some plots.
sc.pl.umap(azoospermic, color=['clusters_spc','pseudotimes'],
legend_loc='on data', title='Azoospermic patient clustering')
WARNING: The title list is shorter than the number of panels. Using 'color' value instead for some plots.
healthy.shape
(6431, 22790)
azoospermic.shape
(2147, 14018)
Put data together¶
One possible comparison is to do a differential gene expression of each cluster found in both datasets. In this way we can find genes expressed in one sample and not the other. To do this we first concatenate the datasets and normalize them.
batch_names = ['healthy','azoospermic'] #choose names for samples
sample = ad.AnnData.concatenate(healthy, azoospermic, batch_key='condition') #concatenate
sample.rename_categories(key='condition', categories=batch_names) #apply sample names
scv.utils.cleanup(sample, clean='var') #remove duplicated gene quantites
We normalize the data and consider both batch and condition as batch variable to distinguish samples
sample.obs['batch_condition'] = [f'{i}_{j}' for i,j in zip(sample.obs['batch'],sample.obs['condition'])]
rawMatrix = np.array( sample.layers['umi_raw'].T.copy())
genes_name = sample.var_names
cells_info = sample.obs[ ["batch_condition"] ].copy()
%%R -i cells_info -i rawMatrix -i genes_name
library(scater)
cell_df <- DataFrame(data = cells_info)
colnames(rawMatrix) <- rownames(cell_df) #cell names
rownames(rawMatrix) <- genes_name #gene names
%%R
library(sctransform)
library(future)
future::plan(strategy = 'multicore', workers = 32)
options(future.globals.maxSize = 50 * 1024 ^ 3)
%%R
vst_out=vst( as.matrix(rawMatrix), #data matrix
cell_attr=cell_df, #dataframe containing batch variable
n_genes=3000, #most variable genes in your data
batch_var='data.batch_condition', #name of the batch variable
method='qpoisson', #type of statistical model. use "poisson" for more precision but much slower execution
show_progress=TRUE, #show progress bars
return_corrected_umi=TRUE) #return corrected umi count matrix
|======================================================================| 100% |======================================================================| 100% |======================================================================| 100%
%%R -o new_matrix -o sct_genes -o all_genes -o umi_matrix
new_matrix=vst_out$y #normalized matrix
sct_genes = rownames(vst_out$model_pars) #most variable genes
all_genes = rownames(new_matrix) #vector of all genes to check if any have been filtered out
umi_matrix=vst_out$umi_corrected #umi matrix
sct_genes = list(sct_genes)
sample.var['highly_variable'] = [i in sct_genes for i in sample.var_names]
sample = sample[:,list(all_genes)].copy()
sample.layers['norm_sct_condition'] = np.transpose( new_matrix )
sample.layers['umi_sct_condition'] = np.transpose( umi_matrix )
Now we have less genes because of the azoospermic dataset
sample
AnnData object with n_obs × n_vars = 8578 × 14009 obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'perc_mito', 'n_counts', 'n_genes', 'doublet_score', 'predicted_doublet', 'batch', 'leiden', 'clusters', 'clusters_spc', 'pseudotimes', 'clusters_som', 'condition', 'batch_condition' var: 'highly_variable' obsm: 'X_pca', 'X_umap' layers: 'norm_sct', 'umi_log', 'umi_raw', 'umi_sct', 'umi_tpm', 'norm_sct_condition', 'umi_sct_condition'
Differential expression¶
Here we do a simple differential expression analysis of all healthy vs all azoospermic cells. We can see how healhy samples mostly dominate with the expression of genes related to the development of sperm, especially in the round and elongated spermatids stages. many of the genes for azoospermic cells are ribosomal genes.
sample.X = sample.layers['umi_sct_condition'].copy()
sc.pp.log1p(sample)
sc.tl.rank_genes_groups(sample, groupby='condition', key_added='DE_condition',
use_raw=False, n_genes=50, method='wilcoxon')
... storing 'batch' as categorical ... storing 'leiden' as categorical ... storing 'clusters' as categorical ... storing 'clusters_spc' as categorical ... storing 'batch_condition' as categorical
pd.DataFrame(sample.uns['DE_condition']['names'])
healthy | azoospermic | |
---|---|---|
0 | PRM2 | RPS27 |
1 | PRM1 | RPS29 |
2 | CRISP2 | RPS26 |
3 | TNP1 | RANBP1 |
4 | CCDC7 | FTH1 |
5 | TRIM36 | RPS23 |
6 | ZNF295-AS1 | RPL34 |
7 | LINC01921 | RPL38 |
8 | ODF2 | RPS28 |
9 | MLF1 | RPL37 |
10 | C2orf73 | RPS2 |
11 | SPATA4 | RPLP0 |
12 | DCUN1D1 | RPLP1 |
13 | ROPN1 | RPL36 |
14 | HMGB4 | TOMM7 |
15 | CCDC110 | RPL6 |
16 | PFKP | CST3 |
17 | ADAD1 | RPL4 |
18 | BRDT | PTMA |
19 | SSX2IP | ITM2B |
20 | BAG5 | RPL12 |
21 | MPC2 | SRSF2 |
22 | NUPR2 | NOP53 |
23 | FAM104A | MTDH |
24 | PGK2 | PCBP2 |
25 | PDHA2 | RPL10A |
26 | NME5 | RPS9 |
27 | CABYR | RPL18 |
28 | DKKL1 | RPS8 |
29 | C11orf71 | KDELR2 |
30 | ATAD1 | RPS3 |
31 | CAMLG | RPL37A |
32 | MORN2 | TMSB4X |
33 | CAPZA3 | SNHG7 |
34 | TPP2 | PRELID1 |
35 | IFT57 | UBL5 |
36 | DNAJC5B | MAP1LC3B |
37 | ROPN1B | SNHG5 |
38 | TMF1 | RPL21 |
39 | SMCP | RPS14 |
40 | PIH1D2 | RPLP2 |
41 | GSG1 | RPL14 |
42 | SPATA7 | RPL10 |
43 | FAM81B | MT-CO3 |
44 | MRPL42 | RPL18A |
45 | H2AFJ | MT-ND3 |
46 | CDKN3 | RPL36A |
47 | B4GALT1-AS1 | MIF |
48 | ACTRT2 | RPL41 |
49 | ARMC3 | SRRM2 |
You can again look at log-fold changes and p-values
result = sample.uns['DE_condition']
groups = result['names'].dtype.names
X = pd.DataFrame(
{group + '_' + key[:1].upper(): result[key][group]
for group in groups for key in ['names', 'pvals_adj','logfoldchanges']})
X
healthy_N | healthy_P | healthy_L | azoospermic_N | azoospermic_P | azoospermic_L | |
---|---|---|---|---|---|---|
0 | PRM2 | 0.000000e+00 | 2.032233 | RPS27 | 0.000000e+00 | 1.026861 |
1 | PRM1 | 8.437521e-278 | 2.002583 | RPS29 | 0.000000e+00 | 1.019021 |
2 | CRISP2 | 1.365587e-275 | 2.028542 | RPS26 | 5.887639e-283 | 1.706868 |
3 | TNP1 | 3.314488e-248 | 1.960973 | RANBP1 | 2.006989e-258 | 1.354506 |
4 | CCDC7 | 3.481192e-218 | 1.627345 | FTH1 | 7.402147e-247 | 1.787864 |
5 | TRIM36 | 9.660027e-212 | 1.600318 | RPS23 | 1.312549e-243 | 1.120837 |
6 | ZNF295-AS1 | 8.585883e-207 | 2.665506 | RPL34 | 4.200615e-230 | 0.888604 |
7 | LINC01921 | 4.328831e-204 | 2.113969 | RPL38 | 3.827195e-223 | 0.688914 |
8 | ODF2 | 1.599192e-201 | 1.300002 | RPS28 | 1.337696e-220 | 1.546971 |
9 | MLF1 | 1.570552e-200 | 1.216649 | RPL37 | 1.047240e-218 | 0.832729 |
10 | C2orf73 | 1.061742e-194 | 1.577648 | RPS2 | 3.335509e-215 | 1.854126 |
11 | SPATA4 | 6.659028e-192 | 1.445028 | RPLP0 | 1.289418e-214 | 1.185492 |
12 | DCUN1D1 | 1.017568e-182 | 1.625203 | RPLP1 | 4.094501e-187 | 1.333767 |
13 | ROPN1 | 1.508635e-181 | 1.584022 | RPL36 | 3.640184e-184 | 0.944616 |
14 | HMGB4 | 3.223076e-180 | 1.864275 | TOMM7 | 1.270308e-183 | 0.970103 |
15 | CCDC110 | 5.220077e-180 | 1.732849 | RPL6 | 1.002790e-180 | 1.022597 |
16 | PFKP | 1.278087e-178 | 1.463896 | CST3 | 1.260241e-177 | 2.394424 |
17 | ADAD1 | 3.616420e-176 | 1.324294 | RPL4 | 1.460616e-176 | 0.796177 |
18 | BRDT | 8.874404e-174 | 1.290164 | PTMA | 8.979741e-175 | 1.705152 |
19 | SSX2IP | 5.096976e-171 | 1.476762 | ITM2B | 1.604503e-173 | 2.367598 |
20 | BAG5 | 2.808186e-168 | 1.357559 | RPL12 | 2.164859e-169 | 1.274136 |
21 | MPC2 | 1.214120e-167 | 0.905591 | SRSF2 | 4.355515e-166 | 1.596860 |
22 | NUPR2 | 2.634776e-162 | 1.640146 | NOP53 | 5.980976e-162 | 1.646122 |
23 | FAM104A | 3.466602e-160 | 1.044372 | MTDH | 1.129843e-158 | 1.224478 |
24 | PGK2 | 4.973615e-160 | 1.429321 | PCBP2 | 1.658544e-154 | 1.404459 |
25 | PDHA2 | 1.642773e-158 | 1.552514 | RPL10A | 4.730326e-153 | 1.158851 |
26 | NME5 | 4.617018e-158 | 1.289523 | RPS9 | 1.458009e-152 | 0.653831 |
27 | CABYR | 4.015829e-157 | 1.422755 | RPL18 | 1.200189e-151 | 1.581931 |
28 | DKKL1 | 7.798017e-156 | 1.504000 | RPS8 | 2.148500e-147 | 0.876272 |
29 | C11orf71 | 2.073641e-155 | 1.292396 | KDELR2 | 3.930590e-147 | 1.265155 |
30 | ATAD1 | 3.476670e-154 | 1.453236 | RPS3 | 3.491632e-144 | 0.919098 |
31 | CAMLG | 2.757895e-153 | 0.921729 | RPL37A | 6.894196e-141 | 0.469303 |
32 | MORN2 | 1.001687e-152 | 1.051656 | TMSB4X | 1.566276e-139 | 1.955806 |
33 | CAPZA3 | 5.032883e-152 | 1.846487 | SNHG7 | 2.205658e-134 | 2.393714 |
34 | TPP2 | 1.209302e-149 | 1.473941 | PRELID1 | 2.977950e-134 | 2.087113 |
35 | IFT57 | 1.874411e-148 | 1.070314 | UBL5 | 3.269293e-134 | 1.022076 |
36 | DNAJC5B | 1.844754e-145 | 1.805371 | MAP1LC3B | 4.815629e-134 | 1.013808 |
37 | ROPN1B | 1.930813e-145 | 1.505872 | SNHG5 | 6.897227e-134 | 1.493377 |
38 | TMF1 | 3.429067e-145 | 1.395980 | RPL21 | 1.176173e-133 | 1.789231 |
39 | SMCP | 1.581934e-144 | 1.590584 | RPS14 | 2.252929e-133 | 0.548712 |
40 | PIH1D2 | 3.286820e-144 | 1.328082 | RPLP2 | 4.735469e-131 | 0.595375 |
41 | GSG1 | 1.233251e-143 | 1.588035 | RPL14 | 4.360103e-129 | 0.965527 |
42 | SPATA7 | 4.604242e-143 | 1.577425 | RPL10 | 2.062930e-128 | 1.895483 |
43 | FAM81B | 8.028167e-142 | 1.791096 | MT-CO3 | 2.657770e-128 | 1.624999 |
44 | MRPL42 | 9.043097e-142 | 1.034540 | RPL18A | 1.201094e-126 | 1.703612 |
45 | H2AFJ | 1.441976e-140 | 1.373664 | MT-ND3 | 3.394581e-126 | 1.560312 |
46 | CDKN3 | 4.676738e-140 | 1.375274 | RPL36A | 1.429154e-125 | 1.860079 |
47 | B4GALT1-AS1 | 8.175465e-137 | 1.812452 | MIF | 8.080621e-125 | 1.383539 |
48 | ACTRT2 | 2.204747e-136 | 1.581528 | RPL41 | 7.655546e-124 | 0.854599 |
49 | ARMC3 | 4.142454e-136 | 1.658994 | SRRM2 | 1.194687e-123 | 1.427597 |
X.to_csv('../../Data/results/diff_expression_condition.csv', header=True, index=False)
Integration plot. We use the standard PCA because it is faster and rely on bbknn
for correcting differences between samples. While we could not identify Somatic cells in healthy data, now they can be distinguished into endothelial and somatic with the overlapping UMAP plot
sample.X = sample.layers['norm_sct_condition'].copy() #use normalized data in .X
sc.pp.scale(sample) #standardize
sc.preprocessing.pca(sample, svd_solver='arpack', random_state=12345) #do PCA
import bbknn as bbknn
bbknn.bbknn(sample, batch_key='batch_condition')
sc.tools.umap(sample, random_state=54321)
sc.plotting.umap(sample, color=['condition','clusters_spc'], ncols=1)
Below, we average the UMAP coordinates for each cluster in the azoospermic (A) and healthy (H) dataset, and plot those averages. We can see if they are close to each other, or if they are far apart. Notice that Spermatogonia B
and Leptotene
overlap. This because in only one of the two dataset we have left some spermatogonia B cells where we could not observe leptotene markers. But we could also have misedentified some Spermatogonia A cells. Somatic cells are off compared to the myoid and endothelial, simply because of the different cell identification
new_names = np.array([ str(i[0]).upper() + '_' + str(j) for i,j in
zip(sample.obs['condition'], sample.obs['clusters_spc']) ])
np.unique(new_names)
markers = { 'azoospermic':'s', 'healthy':'p' }
idx = [i=='A_Dyplotene' for i in new_names]
new_names[idx] = 'A_Diplotene'
np.unique(new_names)
plt.rcParams['figure.figsize']=(10,6) #rescale figures
X_umap = sample.obsm['X_umap'].copy()
x = []
y = []
clst = []
condition = []
#need the same category names order to have the same color palette for the clusters
for i in np.unique(new_names):
boolean = [j==i for j in new_names]
x.append( np.mean(X_umap[boolean,0]) )
y.append( np.mean(X_umap[boolean,1]) )
clst.append( i.split('_')[1] )
condition.append( sample.obs['condition'][boolean][0] )
sns.set_style("white", {'axes.grid' : False})
g=sns.scatterplot(x,y,style=condition,hue=clst, markers=markers, s=1000)
g.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize= 20, markerscale = 3)
g.set_title('Overlapping of cluster coordinates on UMAP')
g.set(xlabel = 'UMAP_0', ylabel='UMAP_1')
[Text(0.5, 0, 'UMAP_0'), Text(0, 0.5, 'UMAP_1')]
We can look at the percentage of cell clusters in the two datasets. This is not a much reliable number for usage when clustering by hand looking at markers, because we might misidentify some subclusters into one type or the other. For example Spermatogonia B and Leptotene in the healthy data sum up to the amount of leptotene cells in the azoospermic data
healthy.obs['clusters_spc'].value_counts() / healthy.shape[0] * 100
RoundSpermatids 39.200746 Diplotene 19.406002 SpermatogoniaA 10.620432 ElongSpermatids 9.609703 Zygotene 7.510496 SpermatogoniaB 5.100295 Somatic 4.587156 Pachytene 2.612346 Leptotene 1.352822 Name: clusters_spc, dtype: float64
azoospermic.obs['clusters_spc'].value_counts() / azoospermic.shape[0] * 100
SpermatogoniaA 19.701910 RoundSpermatids 19.329297 Myoid 14.019562 Zygotene 12.529110 Diplotene 8.337215 ElongSpermatids 6.380997 Dyplotene 6.054960 Leptotene 5.728924 Pachytene 5.635771 Endothelial 2.282254 Name: clusters_spc, dtype: float64
sample.write('../../Data/notebooks_data/condition.integrated.h5ad')
Gene enrichment¶
Let's do enrichment analysis to see how differentially expressed genes from healthy patients can be interpreted. We use the package gseapy
, that allows you to choose a lot of gene enrichment archives to explore. This package is just an interface to the website Enrichr, where you can copy-paste a list of genes and visualize the same results as in this python
code.
sample = sc.read('../../Data/notebooks_data/condition.integrated.h5ad')
load differential expression table
DE_genes = pd.read_csv('../../Data/results/diff_expression_condition.csv')
DE_genes = DE_genes.loc[:, [i.split('_')[1]=='N' for i in DE_genes.columns] ]
Run gene enrichment analysis. Results are in the folders Data/results/enrichment_condition/healthy
and Data/results/enrichment_condition/azoospermic
of the course material.
enrich_results = dict()
for CONDITION in DE_genes.columns:
print('------Enrichment analysis for condition ' + CONDITION.split('_')[0] + '------')
enrich_results[CONDITION.split('_')[0]] = gp.enrichr(gene_list=DE_genes[CONDITION],
gene_sets=[ 'ARCHS4_TFs_Coexp',
'Chromosome_Location_hg19',
'WikiPathway_2021_Human',
'ARCHS4_Tissues',
'GO_Molecular_Function_2021',],
organism='Human', # don't forget to set organism to the one you desired
description=CONDITION,
outdir=f'../../Data/results/enrichment_condition/{CONDITION}',
cutoff=0.05 # p-value for enrichment test.
)
------Enrichment analysis for condition healthy------
2022-02-23 12:46:10,635 Warning: No enrich terms using library Chromosome_Location_hg19 when cutoff = 0.05 2022-02-23 12:46:18,540 Warning: No enrich terms using library GO_Molecular_Function_2021 when cutoff = 0.05
------Enrichment analysis for condition azoospermic------
2022-02-23 12:46:24,940 Warning: No enrich terms using library Chromosome_Location_hg19 when cutoff = 0.05
Note we have chosen five databases as example (option gene_sets
), but you can see a list with all databases below, or by visiting the webpage
gp.get_library_name()
['ARCHS4_Cell-lines', 'ARCHS4_IDG_Coexp', 'ARCHS4_Kinases_Coexp', 'ARCHS4_TFs_Coexp', 'ARCHS4_Tissues', 'Achilles_fitness_decrease', 'Achilles_fitness_increase', 'Aging_Perturbations_from_GEO_down', 'Aging_Perturbations_from_GEO_up', 'Allen_Brain_Atlas_10x_scRNA_2021', 'Allen_Brain_Atlas_down', 'Allen_Brain_Atlas_up', 'Azimuth_Cell_Types_2021', 'BioCarta_2013', 'BioCarta_2015', 'BioCarta_2016', 'BioPlanet_2019', 'BioPlex_2017', 'CCLE_Proteomics_2020', 'CORUM', 'COVID-19_Related_Gene_Sets', 'COVID-19_Related_Gene_Sets_2021', 'Cancer_Cell_Line_Encyclopedia', 'CellMarker_Augmented_2021', 'ChEA_2013', 'ChEA_2015', 'ChEA_2016', 'Chromosome_Location', 'Chromosome_Location_hg19', 'ClinVar_2019', 'DSigDB', 'Data_Acquisition_Method_Most_Popular_Genes', 'DepMap_WG_CRISPR_Screens_Broad_CellLines_2019', 'DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019', 'Descartes_Cell_Types_and_Tissue_2021', 'DisGeNET', 'Disease_Perturbations_from_GEO_down', 'Disease_Perturbations_from_GEO_up', 'Disease_Signatures_from_GEO_down_2014', 'Disease_Signatures_from_GEO_up_2014', 'DrugMatrix', 'Drug_Perturbations_from_GEO_2014', 'Drug_Perturbations_from_GEO_down', 'Drug_Perturbations_from_GEO_up', 'ENCODE_Histone_Modifications_2013', 'ENCODE_Histone_Modifications_2015', 'ENCODE_TF_ChIP-seq_2014', 'ENCODE_TF_ChIP-seq_2015', 'ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X', 'ESCAPE', 'Elsevier_Pathway_Collection', 'Enrichr_Libraries_Most_Popular_Genes', 'Enrichr_Submissions_TF-Gene_Coocurrence', 'Enrichr_Users_Contributed_Lists_2020', 'Epigenomics_Roadmap_HM_ChIP-seq', 'GO_Biological_Process_2013', 'GO_Biological_Process_2015', 'GO_Biological_Process_2017', 'GO_Biological_Process_2017b', 'GO_Biological_Process_2018', 'GO_Biological_Process_2021', 'GO_Cellular_Component_2013', 'GO_Cellular_Component_2015', 'GO_Cellular_Component_2017', 'GO_Cellular_Component_2017b', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_2021', 'GO_Molecular_Function_2013', 'GO_Molecular_Function_2015', 'GO_Molecular_Function_2017', 'GO_Molecular_Function_2017b', 'GO_Molecular_Function_2018', 'GO_Molecular_Function_2021', 'GTEx_Aging_Signatures_2021', 'GTEx_Tissue_Expression_Down', 'GTEx_Tissue_Expression_Up', 'GWAS_Catalog_2019', 'GeneSigDB', 'Gene_Perturbations_from_GEO_down', 'Gene_Perturbations_from_GEO_up', 'Genes_Associated_with_NIH_Grants', 'Genome_Browser_PWMs', 'HDSigDB_Human_2021', 'HDSigDB_Mouse_2021', 'HMDB_Metabolites', 'HMS_LINCS_KinomeScan', 'HomoloGene', 'HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression', 'HumanCyc_2015', 'HumanCyc_2016', 'Human_Gene_Atlas', 'Human_Phenotype_Ontology', 'InterPro_Domains_2019', 'Jensen_COMPARTMENTS', 'Jensen_DISEASES', 'Jensen_TISSUES', 'KEA_2013', 'KEA_2015', 'KEGG_2013', 'KEGG_2015', 'KEGG_2016', 'KEGG_2019_Human', 'KEGG_2019_Mouse', 'KEGG_2021_Human', 'Kinase_Perturbations_from_GEO_down', 'Kinase_Perturbations_from_GEO_up', 'L1000_Kinase_and_GPCR_Perturbations_down', 'L1000_Kinase_and_GPCR_Perturbations_up', 'LINCS_L1000_Chem_Pert_down', 'LINCS_L1000_Chem_Pert_up', 'LINCS_L1000_Ligand_Perturbations_down', 'LINCS_L1000_Ligand_Perturbations_up', 'Ligand_Perturbations_from_GEO_down', 'Ligand_Perturbations_from_GEO_up', 'MCF7_Perturbations_from_GEO_down', 'MCF7_Perturbations_from_GEO_up', 'MGI_Mammalian_Phenotype_2013', 'MGI_Mammalian_Phenotype_2017', 'MGI_Mammalian_Phenotype_Level_3', 'MGI_Mammalian_Phenotype_Level_4', 'MGI_Mammalian_Phenotype_Level_4_2019', 'MGI_Mammalian_Phenotype_Level_4_2021', 'MSigDB_Computational', 'MSigDB_Hallmark_2020', 'MSigDB_Oncogenic_Signatures', 'Microbe_Perturbations_from_GEO_down', 'Microbe_Perturbations_from_GEO_up', 'Mouse_Gene_Atlas', 'NCI-60_Cancer_Cell_Lines', 'NCI-Nature_2015', 'NCI-Nature_2016', 'NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions', 'NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions', 'NIH_Funded_PIs_2017_Human_AutoRIF', 'NIH_Funded_PIs_2017_Human_GeneRIF', 'NURSA_Human_Endogenous_Complexome', 'OMIM_Disease', 'OMIM_Expanded', 'Old_CMAP_down', 'Old_CMAP_up', 'Orphanet_Augmented_2021', 'PPI_Hub_Proteins', 'PanglaoDB_Augmented_2021', 'Panther_2015', 'Panther_2016', 'Pfam_Domains_2019', 'Pfam_InterPro_Domains', 'PheWeb_2019', 'PhenGenI_Association_2021', 'Phosphatase_Substrates_from_DEPOD', 'ProteomicsDB_2020', 'RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO', 'RNAseq_Automatic_GEO_Signatures_Human_Down', 'RNAseq_Automatic_GEO_Signatures_Human_Up', 'RNAseq_Automatic_GEO_Signatures_Mouse_Down', 'RNAseq_Automatic_GEO_Signatures_Mouse_Up', 'Rare_Diseases_AutoRIF_ARCHS4_Predictions', 'Rare_Diseases_AutoRIF_Gene_Lists', 'Rare_Diseases_GeneRIF_ARCHS4_Predictions', 'Rare_Diseases_GeneRIF_Gene_Lists', 'Reactome_2013', 'Reactome_2015', 'Reactome_2016', 'SILAC_Phosphoproteomics', 'SubCell_BarCode', 'SysMyo_Muscle_Gene_Sets', 'TF-LOF_Expression_from_GEO', 'TF_Perturbations_Followed_by_Expression', 'TG_GATES_2020', 'TRANSFAC_and_JASPAR_PWMs', 'TRRUST_Transcription_Factors_2019', 'Table_Mining_of_CRISPR_Studies', 'TargetScan_microRNA', 'TargetScan_microRNA_2017', 'Tissue_Protein_Expression_from_Human_Proteome_Map', 'Tissue_Protein_Expression_from_ProteomicsDB', 'Transcription_Factor_PPIs', 'UK_Biobank_GWAS_v1', 'Virus-Host_PPI_P-HIPSTer_2020', 'VirusMINT', 'Virus_Perturbations_from_GEO_down', 'Virus_Perturbations_from_GEO_up', 'WikiPathway_2021_Human', 'WikiPathways_2013', 'WikiPathways_2015', 'WikiPathways_2016', 'WikiPathways_2019_Human', 'WikiPathways_2019_Mouse', 'dbGaP', 'huMAP', 'lncHUB_lncRNA_Co-Expression', 'miRTarBase_2017']
We can plot some information here instead of looking into folders. we can plot a table with pvalues, genes present in the database and their enrichment term. Here the enrichment for healthy samples (filtered with pvalue <0.01)
healthy_table = enrich_results['healthy'].results #get the table
healthy_table.head() #table preview
Gene_set | Term | Overlap | P-value | Adjusted P-value | Old P-value | Old Adjusted P-value | Odds Ratio | Combined Score | Genes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ARCHS4_TFs_Coexp | YBX2 human tf ARCHS4 coexpression | 16/299 | 1.298141e-17 | 3.810044e-15 | 0 | 0 | 32.703388 | 1271.606272 | ROPN1B;SMCP;CRISP2;PRM2;CCDC110;PRM1;ODF2;DKKL... |
1 | ARCHS4_TFs_Coexp | HSF5 human tf ARCHS4 coexpression | 16/299 | 1.298141e-17 | 3.810044e-15 | 0 | 0 | 32.703388 | 1271.606272 | ROPN1B;SMCP;CRISP2;PRM2;PRM1;ODF2;HMGB4;CABYR;... |
2 | ARCHS4_TFs_Coexp | DHX57 human tf ARCHS4 coexpression | 13/299 | 3.105686e-13 | 6.076792e-11 | 0 | 0 | 24.157248 | 695.737723 | SMCP;CRISP2;PRM2;PRM1;ODF2;HMGB4;CABYR;CAPZA3;... |
3 | ARCHS4_TFs_Coexp | SOX30 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;ADAD1;... |
4 | ARCHS4_TFs_Coexp | CUL3 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;CCDC7;... |
Note the Wikipathway
results at the end of the table, where we have genes related to male infertility (when their expression is disrupted) and genes involved in the Cori Cycle (essential for spermatogenesis). Also, Sperm and Testis are recognized as likely tissues from which our data comes from. Relevant transcriptio factors highlighted here are for example HSF5 (early spermatogenesis), YBX2 (Abnormal spermatogenesis in case of disruption), SOX30 (male fertility). Using gene enrichment analyses requires of course a biological background to understand the usefulness of results.
healthy_table[ healthy_table['Adjusted P-value']<0.01 ] #filtered with pvalue
Gene_set | Term | Overlap | P-value | Adjusted P-value | Old P-value | Old Adjusted P-value | Odds Ratio | Combined Score | Genes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ARCHS4_TFs_Coexp | YBX2 human tf ARCHS4 coexpression | 16/299 | 1.298141e-17 | 3.810044e-15 | 0 | 0 | 32.703388 | 1271.606272 | ROPN1B;SMCP;CRISP2;PRM2;CCDC110;PRM1;ODF2;DKKL... |
1 | ARCHS4_TFs_Coexp | HSF5 human tf ARCHS4 coexpression | 16/299 | 1.298141e-17 | 3.810044e-15 | 0 | 0 | 32.703388 | 1271.606272 | ROPN1B;SMCP;CRISP2;PRM2;PRM1;ODF2;HMGB4;CABYR;... |
2 | ARCHS4_TFs_Coexp | DHX57 human tf ARCHS4 coexpression | 13/299 | 3.105686e-13 | 6.076792e-11 | 0 | 0 | 24.157248 | 695.737723 | SMCP;CRISP2;PRM2;PRM1;ODF2;HMGB4;CABYR;CAPZA3;... |
3 | ARCHS4_TFs_Coexp | SOX30 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;ADAD1;... |
4 | ARCHS4_TFs_Coexp | CUL3 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;CCDC7;... |
5 | ARCHS4_TFs_Coexp | ZDHHC19 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;CCDC7;... |
6 | ARCHS4_TFs_Coexp | HIST1H1T human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | ROPN1B;PRM2;CCDC110;PRM1;CABYR;ODF2;GSG1;ADAD1... |
7 | ARCHS4_TFs_Coexp | RFX4 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;TRIM36;PGK2... |
8 | ARCHS4_TFs_Coexp | ZNF541 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;ADAD1;... |
9 | ARCHS4_TFs_Coexp | DMRTC2 human tf ARCHS4 coexpression | 12/299 | 7.312013e-12 | 4.292151e-10 | 0 | 0 | 21.635430 | 554.764935 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ODF2;CCDC7;... |
10 | ARCHS4_TFs_Coexp | NFYA human tf ARCHS4 coexpression | 11/299 | 1.544050e-10 | 6.971980e-09 | 0 | 0 | 19.255876 | 435.018007 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;ODF2;PGK2;TNP1;AC... |
11 | ARCHS4_TFs_Coexp | ZIM2 human tf ARCHS4 coexpression | 11/299 | 1.544050e-10 | 6.971980e-09 | 0 | 0 | 19.255876 | 435.018007 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;ADAD1;PGK2;... |
12 | ARCHS4_TFs_Coexp | ZNF473 human tf ARCHS4 coexpression | 11/299 | 1.544050e-10 | 6.971980e-09 | 0 | 0 | 19.255876 | 435.018007 | SMCP;CRISP2;PRM2;CABYR;ODF2;ADAD1;PGK2;TNP1;AC... |
13 | ARCHS4_TFs_Coexp | ZNF628 human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;ODF2;NUPR2;PGK2;T... |
14 | ARCHS4_TFs_Coexp | ADAMTS17 human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;PGK2;TNP1;A... |
15 | ARCHS4_TFs_Coexp | JRKL human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;NUPR2;PGK2;TNP1;A... |
16 | ARCHS4_TFs_Coexp | HILS1 human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;ODF2;PGK2;TNP1;AC... |
17 | ARCHS4_TFs_Coexp | ZNF213 human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;CABYR;CAPZA3;ODF2;NUPR2;PGK2;... |
18 | ARCHS4_TFs_Coexp | ZNF513 human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;ODF2;NUPR2;PGK2;T... |
19 | ARCHS4_TFs_Coexp | EVX2 human tf ARCHS4 coexpression | 10/299 | 2.906607e-09 | 8.530891e-08 | 0 | 0 | 17.007785 | 334.309785 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;ODF2;PGK2;TNP1;AC... |
20 | ARCHS4_TFs_Coexp | ETV2 human tf ARCHS4 coexpression | 9/299 | 4.841854e-08 | 1.184237e-06 | 0 | 0 | 14.881413 | 250.653339 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;PGK2;TNP1;ACTRT2;... |
21 | ARCHS4_TFs_Coexp | ZC3H10 human tf ARCHS4 coexpression | 9/299 | 4.841854e-08 | 1.184237e-06 | 0 | 0 | 14.881413 | 250.653339 | SMCP;CRISP2;PRM2;PRM1;CABYR;CAPZA3;PGK2;TNP1;A... |
22 | ARCHS4_TFs_Coexp | ZC3H18 human tf ARCHS4 coexpression | 9/299 | 4.841854e-08 | 1.184237e-06 | 0 | 0 | 14.881413 | 250.653339 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;ODF2;PGK2;TNP1;AC... |
23 | ARCHS4_TFs_Coexp | EVX1 human tf ARCHS4 coexpression | 9/299 | 4.841854e-08 | 1.184237e-06 | 0 | 0 | 14.881413 | 250.653339 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;PGK2;TNP1;ACTRT2;... |
24 | ARCHS4_TFs_Coexp | PAX2 human tf ARCHS4 coexpression | 8/299 | 7.073288e-07 | 1.384007e-05 | 0 | 0 | 12.867943 | 182.232853 | SMCP;CRISP2;PRM2;CAPZA3;ODF2;PGK2;TNP1;ACTRT2 |
25 | ARCHS4_TFs_Coexp | MAEL human tf ARCHS4 coexpression | 8/299 | 7.073288e-07 | 1.384007e-05 | 0 | 0 | 12.867943 | 182.232853 | PRM2;PRM1;CABYR;CAPZA3;TNP1;ACTRT2;HMGB4;BRDT |
26 | ARCHS4_TFs_Coexp | SHOX2 human tf ARCHS4 coexpression | 8/299 | 7.073288e-07 | 1.384007e-05 | 0 | 0 | 12.867943 | 182.232853 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;PGK2;TNP1;ACTRT2 |
27 | ARCHS4_TFs_Coexp | RNF113B human tf ARCHS4 coexpression | 8/299 | 7.073288e-07 | 1.384007e-05 | 0 | 0 | 12.867943 | 182.232853 | ROPN1B;PRM2;CCDC110;GSG1;ADAD1;IFT57;ROPN1;BRDT |
28 | ARCHS4_TFs_Coexp | HMGB4 human tf ARCHS4 coexpression | 8/299 | 7.073288e-07 | 1.384007e-05 | 0 | 0 | 12.867943 | 182.232853 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;PGK2;TNP1;ACTRT2 |
29 | ARCHS4_TFs_Coexp | FOXO6 human tf ARCHS4 coexpression | 8/299 | 7.073288e-07 | 1.384007e-05 | 0 | 0 | 12.867943 | 182.232853 | SMCP;CRISP2;PRM2;CAPZA3;ODF2;PGK2;TNP1;ACTRT2 |
30 | ARCHS4_TFs_Coexp | ZCCHC6 human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;TNP1;ACTRT2 |
31 | ARCHS4_TFs_Coexp | GTF2F1 human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | SMCP;CRISP2;PRM2;ODF2;PGK2;TNP1;ACTRT2 |
32 | ARCHS4_TFs_Coexp | ZNF578 human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | PRM2;PRM1;CAPZA3;GSG1;SPATA4;TNP1;HMGB4 |
33 | ARCHS4_TFs_Coexp | ZFAT human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;TNP1;ACTRT2 |
34 | ARCHS4_TFs_Coexp | ZSCAN20 human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;TNP1;HMGB4 |
35 | ARCHS4_TFs_Coexp | ZNF668 human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | PRM2;PRM1;CAPZA3;TNP1;ACTRT2;HMGB4;BRDT |
36 | ARCHS4_TFs_Coexp | ZNF683 human tf ARCHS4 coexpression | 7/299 | 8.960863e-06 | 1.421629e-04 | 0 | 0 | 10.959382 | 127.376996 | SMCP;CRISP2;PRM2;PRM1;CAPZA3;TNP1;ACTRT2 |
37 | ARCHS4_TFs_Coexp | RFX3 human tf ARCHS4 coexpression | 6/299 | 9.705912e-05 | 1.356517e-03 | 0 | 0 | 9.148464 | 84.533550 | ARMC3;MORN2;TRIM36;NME5;FAM81B;MLF1 |
38 | ARCHS4_TFs_Coexp | RNF138 human tf ARCHS4 coexpression | 6/299 | 9.705912e-05 | 1.356517e-03 | 0 | 0 | 9.148464 | 84.533550 | SMCP;CRISP2;CABYR;BAG5;ADAD1;PGK2 |
39 | ARCHS4_TFs_Coexp | TBX1 human tf ARCHS4 coexpression | 6/299 | 9.705912e-05 | 1.356517e-03 | 0 | 0 | 9.148464 | 84.533550 | SMCP;PRM2;PRM1;CAPZA3;TNP1;ACTRT2 |
40 | ARCHS4_TFs_Coexp | DMRTB1 human tf ARCHS4 coexpression | 6/299 | 9.705912e-05 | 1.356517e-03 | 0 | 0 | 9.148464 | 84.533550 | SMCP;CRISP2;CABYR;ADAD1;PGK2;BRDT |
41 | ARCHS4_TFs_Coexp | ZNF608 human tf ARCHS4 coexpression | 6/299 | 9.705912e-05 | 1.356517e-03 | 0 | 0 | 9.148464 | 84.533550 | SMCP;CRISP2;PRM2;TRIM36;PGK2;TNP1 |
606 | WikiPathway_2021_Human | Glycolysis and Gluconeogenesis WP534 | 3/45 | 1.937400e-04 | 2.131140e-03 | 0 | 0 | 30.255319 | 258.652525 | MPC2;PGK2;PFKP |
607 | WikiPathway_2021_Human | Male infertility WP4673 | 4/146 | 4.835142e-04 | 2.659328e-03 | 0 | 0 | 12.129822 | 92.604278 | CRISP2;PRM2;PRM1;BRDT |
608 | WikiPathway_2021_Human | Cori Cycle WP1946 | 2/17 | 8.132713e-04 | 2.981995e-03 | 0 | 0 | 55.375000 | 393.962434 | PGK2;PFKP |
617 | ARCHS4_Tissues | SPERM | 21/2316 | 4.714050e-08 | 4.666910e-06 | 0 | 0 | 5.570656 | 93.977707 | ROPN1B;CRISP2;CCDC110;ODF2;DNAJC5B;DKKL1;MLF1;... |
618 | ARCHS4_Tissues | TESTIS (BULK TISSUE) | 19/2316 | 1.303973e-06 | 6.454665e-05 | 0 | 0 | 4.710309 | 63.825139 | ROPN1B;CRISP2;CCDC110;DKKL1;MLF1;HMGB4;CABYR;C... |
In the azoospermic patient most of the enrichment terms are related to ribosomal genes and therefore to processes such as rRNA binding. There isn't much information to get out of this table
azoos_table = enrich_results['azoospermic'].results
azoos_table.head() #table preview
Gene_set | Term | Overlap | P-value | Adjusted P-value | Old P-value | Old Adjusted P-value | Odds Ratio | Combined Score | Genes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ARCHS4_TFs_Coexp | EIF3K human tf ARCHS4 coexpression | 25/299 | 7.600257e-33 | 4.377748e-30 | 0 | 0 | 71.810219 | 5310.877416 | RPL10;RPL12;RPL34;RPLP1;RPLP0;RPL36A;RPL10A;UB... |
1 | ARCHS4_TFs_Coexp | RFXANK human tf ARCHS4 coexpression | 21/299 | 1.044332e-25 | 3.007676e-23 | 0 | 0 | 51.241875 | 2947.496732 | PRELID1;RPS9;RPL41;RPL10;RPS8;RPL34;RPLP1;RPLP... |
2 | ARCHS4_TFs_Coexp | FOXB1 human tf ARCHS4 coexpression | 18/299 | 9.769366e-21 | 1.875718e-18 | 0 | 0 | 39.372998 | 1814.112285 | RPL41;RPL21;RPL34;RPLP1;RPLP0;RPL36A;RPL6;UBL5... |
3 | ARCHS4_TFs_Coexp | POU3F1 human tf ARCHS4 coexpression | 17/299 | 3.725729e-19 | 5.365050e-17 | 0 | 0 | 35.929078 | 1524.609259 | RPL41;RPL21;RPL34;RPLP1;RPL36A;RPL6;UBL5;RPS14... |
4 | ARCHS4_TFs_Coexp | SOX2 human tf ARCHS4 coexpression | 16/299 | 1.298141e-17 | 1.495459e-15 | 0 | 0 | 32.703388 | 1271.606272 | RPL41;RPL21;RPL34;RPLP1;RPL36A;RPL6;UBL5;RPS14... |
azoos_table[ azoos_table['Adjusted P-value']<0.01 ]
Gene_set | Term | Overlap | P-value | Adjusted P-value | Old P-value | Old Adjusted P-value | Odds Ratio | Combined Score | Genes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ARCHS4_TFs_Coexp | EIF3K human tf ARCHS4 coexpression | 25/299 | 7.600257e-33 | 4.377748e-30 | 0 | 0 | 71.810219 | 5310.877416 | RPL10;RPL12;RPL34;RPLP1;RPLP0;RPL36A;RPL10A;UB... |
1 | ARCHS4_TFs_Coexp | RFXANK human tf ARCHS4 coexpression | 21/299 | 1.044332e-25 | 3.007676e-23 | 0 | 0 | 51.241875 | 2947.496732 | PRELID1;RPS9;RPL41;RPL10;RPS8;RPL34;RPLP1;RPLP... |
2 | ARCHS4_TFs_Coexp | FOXB1 human tf ARCHS4 coexpression | 18/299 | 9.769366e-21 | 1.875718e-18 | 0 | 0 | 39.372998 | 1814.112285 | RPL41;RPL21;RPL34;RPLP1;RPLP0;RPL36A;RPL6;UBL5... |
3 | ARCHS4_TFs_Coexp | POU3F1 human tf ARCHS4 coexpression | 17/299 | 3.725729e-19 | 5.365050e-17 | 0 | 0 | 35.929078 | 1524.609259 | RPL41;RPL21;RPL34;RPLP1;RPL36A;RPL6;UBL5;RPS14... |
4 | ARCHS4_TFs_Coexp | SOX2 human tf ARCHS4 coexpression | 16/299 | 1.298141e-17 | 1.495459e-15 | 0 | 0 | 32.703388 | 1271.606272 | RPL41;RPL21;RPL34;RPLP1;RPL36A;RPL6;UBL5;RPS14... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
656 | GO_Molecular_Function_2021 | rRNA binding (GO:0019843) | 6/42 | 8.766374e-10 | 3.199726e-08 | 0 | 0 | 75.431818 | 1573.125115 | RPS14;RPS9;RPL12;RPLP0;RPS3;NOP53 |
657 | GO_Molecular_Function_2021 | mRNA binding (GO:0003729) | 7/263 | 3.869450e-06 | 9.415662e-05 | 0 | 0 | 12.523438 | 156.072065 | SRRM2;RPS26;RPS14;RPL41;PCBP2;RPS3;RPS2 |
658 | GO_Molecular_Function_2021 | large ribosomal subunit rRNA binding (GO:0070180) | 2/5 | 6.095679e-05 | 1.112461e-03 | 0 | 0 | 277.041667 | 2688.785055 | RPL12;RPLP0 |
659 | GO_Molecular_Function_2021 | cadherin binding (GO:0045296) | 6/322 | 1.454970e-04 | 2.124256e-03 | 0 | 0 | 8.472670 | 74.859047 | RPS26;RANBP1;RPL34;RPL14;RPS2;RPL6 |
660 | GO_Molecular_Function_2021 | small ribosomal subunit rRNA binding (GO:0070181) | 2/9 | 2.180466e-04 | 2.652900e-03 | 0 | 0 | 118.708333 | 1000.806420 | RPS14;RPS3 |
82 rows × 10 columns
Wrapping up¶
We have performed a basic analysis of one dataset against the other, and seen how we can find a lot of relevant information about how azoospermic patients are characterized in terms of absence of specific genes and enrichment terms. Note that gene enrichment can be applied in any type of analysis, and this application is just a specific showcase.