Pseudotimes and cell fates¶
Motivation:
While clustering is an useful type of analysis to try giving a structure to the development of cells towards their final stage (spermatozoa), it does not give an understanding of how the development "stretches" from start to end. For example, a cluster can have many cells and look "big" on UMAP, but actually its variability in terms of gene expressions could be low. Also, a developmental process can branches towards different ends (cell fates) or developmental checkpoints (e.g. stages where damaged cells express specific genes for apoptosis/cell death). Pseudotime and cell fates analysis can be used to hgihlight exactly those processes.
- Pseudotimes assigns to each cell the value of a timeline, starting from 0 for the cells at the beginning of the development. This value is purely a reference for ordering the cells development, but pseudotimes at specific stages can be assigned to real times, using previous biological knowledge.
- Cell fates analysis looks at the PCA projection of the data and the pseudotime of each data point on the PCA. From this, it tries to create a tree connecting the cells, so that the end branches of the tree are different end points or stages of the developmental process.
Figure: cell fates tree on a 3D pca plot. Circles represent the middle point of each cluster. From Perredaeau et al. (2017)
Learning objectives:
- Understand and determine the pseudotimes on a single cell dataset
- Infer cell fates and distinguish between differentiation stages or actual final developmental stages
- Compare gene expressions along differentiation
- Cluster genes with similar gene expression
Execution time: 45 minutes
*Import packages*
import scanpy as sc
import pandas as pd
import scvelo as scv
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn
import anndata as ad
import rpy2.rinterface_lib.callbacks
import logging
from rpy2.robjects import pandas2ri
import anndata2ri
# Ignore R warning messages
#Note: this can be commented out to get more verbose R output
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR)
# Automatically convert rpy2 outputs to pandas dataframes
pandas2ri.activate()
anndata2ri.activate()
#import os
#os.environ['R_HOME'] = '../../../scrna-environment/lib/R/' #path to your R installation
%load_ext rpy2.ipython
%%R
.libPaths( c( "../../../../sandbox_scRNA_testAndFeedback/scrna-environment/lib/R/library/" ) )
%matplotlib inline
*Read data*
sample = sc.read('../../Data/notebooks_data/sample_123.filt.norm.red.clst.2.h5ad')
WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.h5ad']. Only considering the two last: ['.2', '.h5ad']. WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.h5ad']. Only considering the two last: ['.2', '.h5ad'].
Calculate pseudotimes and cell fates¶
We want to calculate pseudotimes on the spermatogenic process. We exclude the somatic cells from.
cellsToKeep = [ i not in ['Somatic'] for i in sample.obs['clusters_spc'] ]
sample = sample[ cellsToKeep, : ].copy()
we use the python
package palantir
import palantir
palantir.core.random.seed( a=12345 ) #define random_state (here called 'a')
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
we create a table (pandas
dataframe) with the logarithm of the corrected UMI matrix, since palantir
needs logarithmized raw counts in input
palantir_data = pd.DataFrame(np.log1p(sample.layers['umi_sct'].todense()),
index=sample.obs_names,
columns=sample.var_names)
Instead of letting the package calculate the PCA (without any form of datasets integration), we use our integrated PCA.
pca_projections = pd.DataFrame( sample.obsm['X_pca'][:,0:15].copy(),
index=sample.obs_names )
Now we will infer the pseudotimes and related cell fates. We have to provide where the differentiation process starts from. In our case, we will choose one of the cells in the cluster SpermatogoniaA
. Then Palantir
will assign the pseudotimes=0 to the most appropriate cell in the cluster. Note the option num_waypoints=100
in the last command. This option will use a certain number of cells to build the tree from which to calculate pseudotimes and cell fates. it is suggested to use only a portion of cells from the dataset, since using all cells will make you experience the inference of many cell fates that are mostly due to noise. In other words, you will build a tree with some tiny branches that will be detected as cellular fates.
ORIGIN_STATE = 'SpermatogoniaA' #where to start
sc.tl.diffmap(sample)
diffusionMap = pd.DataFrame(sample.obsm['X_diffmap'][:,1::],
index=sample.obs_names,
columns = [str(i) for i in range(sample.obsm['X_diffmap'].shape[1]-1)])
#apply palantir
start_cell = str(sample[sample.obs['clusters_spc'] == ORIGIN_STATE].obs_names[0]) #assignment of diferentiation start
pr_res = palantir.core.run_palantir( diffusionMap, early_cell=start_cell, num_waypoints=1000) #fate detection
Sampling and flocking waypoints... Time for determining waypoints: 0.008485527833302815 minutes Determining pseudotime... Shortest path distances using 30-nearest neighbor graph...
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
Time for shortest paths: 0.20841457843780517 minutes Iteratively refining the pseudotime... Correlation at iteration 1: 1.0000 Entropy and branch probabilities... Markov chain construction... Identification of terminal states... Computing fundamental matrix and absorption probabilities... Project results to all cells...
We save pseudotimes in our dataset and plot them on UMAP
sample.obs['pseudotime'] = pr_res.pseudotime
sc.pl.umap( sample, color=['clusters_spc','pseudotime'],
legend_loc='on data',
legend_fontsize=16,
ncols=2 )
findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans.
We can look at how pseudotimes are distributed into each cluster. It seems the variability of pseudotimes increases along spermatogenesis with some oscillations. This can mean more variability in the expression of genes in the later clusters (but does not mean that there are more genes that are expressed). Note that there are considerable overlapping in pseudotimes. This is due to the fact that pseudotimes have a spike around Pachytene-Diplotene stages.
cluster_names = [i for i in ['SpermatogoniaA', 'SpermatogoniaB', 'Leptotene', 'Zygotene',
'Pachytene', 'SpermatocitesII', 'Diplotene', 'RoundSpermatids',
'ElongSpermatids'] if i in np.array(sample.obs['clusters_spc']) ]
sc.pl.violin(sample, keys='pseudotime', groupby='clusters_spc', rotation=90,
order=cluster_names)
Analysis of cell fates¶
we can see how many fates we have. For each fate, there is the barcode of the cell best representing a differentiation stage. In some cases you can have more than two fates
fates = list(pr_res.branch_probs.columns)
fates
['TCCCGATGTAGCGTGA-1-0']
We can plot them on the UMAP plot. One fate is clearly the end of spermatogenesis, where cells become elongated spermatids and spermatozoa.There is another fate, probably due to something happening during meiosis.
f, ax = plt.subplots(1,1)
sc.pl.umap( sample,
legend_loc='on data',
legend_fontsize=16, ax=ax, show=False)
coordinates = sample[fates].obsm['X_umap']
ax.plot(coordinates[:,0],coordinates[:,1],'o',markersize=12)
for i in range(coordinates.shape[0]):
ax.text(coordinates[i,0]-1,coordinates[i,1]-2, f'Fate {i}')
ax.set_title("Inferred cell fates")
plt.show()
findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans.
We rename the fates as follows instead of using cell barcodes
fates = np.array( pr_res.branch_probs.columns )
for i in range(coordinates.shape[0]):
fates[i] = f'Fate {i}'
pr_res.branch_probs.columns = fates
We save in our data the probability that each cell differentiate into one of the fates
for i in pr_res.branch_probs.columns:
sample.obs[f'branch_prob_{i}'] = pr_res.branch_probs[i]
Recognizing branchings or developmental stages¶
A good practice is to look at the probabilities of ending in a fate for each cluster. There are two possible scenarios:
- only one fate: all cells have probability 1 of ending at a specific cell fate
- more than one cell fate: some fates are actual branchings of the developmental process, and only some cells will have a probability of ending up in those branchings. Some other fates are just midpoints of the developmental process. Here, they will absorb with probability 1 entire sections of the dataset.
We plot below the probability of each cell (seen by cluster) to end up in a specific fate. Each violin plot corresponds to a single fate.
for i in range(coordinates.shape[0]):
x = sc.pl.violin(sample, groupby='clusters_spc', keys=f'branch_prob_Fate {i}', rotation=90,
order=cluster_names, ylabel=f'Probability of Fate {i}')
Exploring gene expression and clusters¶
Here is a script that plots gene expressions of your choice along pseudotimes. This allows you to see how specific genes behave differently for different fates. Expressions are modeled using the fate probabilities we plotted above.
import palantir
GENES = ['PIWIL1','PIWIL2','PIWIL3']
GENES = np.intersect1d(GENES, sample.var_names)
NGENES = len(GENES)
CLUSTERS = sample.obs['clusters_spc']
PSEUDOTIMES = sample.obs['pseudotime']
gene_trends = palantir.presults.compute_gene_trends(pr_res,
pd.DataFrame(sample.layers['norm_sct'],
index=sample.obs_names,
columns=sample.var_names).loc[:, GENES]
)
plt.rcParams['figure.figsize']=(12,4*int(NGENES))
fig, ax = plt.subplots(NGENES,1)
c = CLUSTERS
x = PSEUDOTIMES
if(NGENES==1):
x2 = []
t = []
style = []
for FATE in list(gene_trends.keys()):
ARRAY = np.array( gene_trends[FATE]['trends'].loc[GENES[0],:].index )
for i in ARRAY:
idx = np.argmin(np.abs(x - i))
x2.append(c[idx])
t.append(i)
if(len(style)==0):
style = np.tile( FATE, 500 )
y = np.array(gene_trends[FATE]['trends'].loc[GENES[0],:])
else:
style = np.append(arr=style,
values=np.tile( FATE, 500 ))
y = np.append(arr=y,
values=np.array(gene_trends[FATE]['trends'].loc[GENES[0],:]))
sns.lineplot(x=t,
y=y, ci=False,
hue=x2, ax=ax, style = style,
linewidth = 5)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
ax.set(xlabel = 'Pseudotime', ylabel=GENES[0])
if(NGENES>1):
for GENE_NR in range(NGENES):
style = []
x2 = []
t = []
for FATE in list(gene_trends.keys()):
ARRAY = np.array( gene_trends[FATE]['trends'].loc[GENES[GENE_NR],:].index )
for i in ARRAY:
idx = np.argmin(np.abs(x - i))
x2.append(c[idx])
t.append(i)
if(len(style)==0):
style = np.tile( FATE, 500 )
y = np.array(gene_trends[FATE]['trends'].loc[GENES[GENE_NR],:])
else:
style = np.append(arr=style,
values=np.tile( FATE, 500 ))
y = np.append(arr=y,
values=np.array(gene_trends[FATE]['trends'].loc[GENES[GENE_NR],:]))
sns.lineplot(x=t,
y=y, ci=False,
hue=x2, ax=ax[GENE_NR],
style = style, linewidth = 5, legend=GENE_NR==0)
ax[GENE_NR].set(ylabel = GENES[GENE_NR])
if(GENE_NR==0):
ax[0].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
ax[0].set_title(f'Gene expression along the fates:\n{list(gene_trends.keys())}')
ax[GENE_NR].set(xlabel = 'Pseudotime')
plt.rcParams['figure.figsize']=(6,6)
Fate 0
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
Time for processing Fate 0: 0.4374883691469828 minutes
Gene clustering: A last thing you can do is to cluster together genes that have the same expression patterns. We can try to do this for each different fate. Here you can only look at one fate at a time
For making the clustering faster, we cluster together only the differentially expressed genes we found in the previous analysis. However, below you can define the variable genes
as any list of genes. You can for example read them from a text file, or you can use all possible genes by writing genes=list(sample.var_names)
genes = []
for names in sample.uns['DE_clusters_spc']['names']:
genes.append( list( names ) )
genes = np.unique(np.ravel(genes))
model the gene expression along pseudotime
gene_trends = palantir.presults.compute_gene_trends( pr_res,
pd.DataFrame(sample[ :, genes ].layers['norm_sct'],
index=sample[ :, genes ].obs_names,
columns=sample[ :, genes ].var_names) )
Fate 0
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans. findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans. findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
Time for processing Fate 0: 0.17321434020996093 minutes
cluster the expressions together and plot clusters. If you see that there should be more clusters than the algorithm calculates, you can try to increase their number by changing the value of k=20
. Usually, you should see a lot of genes expressed (in gray colour) differently from their averaged expression (in blue colour)
trends = gene_trends['Fate 0']['trends']
gene_clusters = palantir.presults.cluster_gene_trends(trends, k=20)
Finding 20 nearest neighbors using minkowski metric and 'auto' algorithm Neighbors computed in 0.10072517395019531 seconds Jaccard graph constructed in 1.2222075462341309 seconds Wrote graph to binary file in 0.007137298583984375 seconds Running Louvain modularity optimization After 1 runs, maximum modularity is Q = 0.801915 Louvain completed 21 runs in 1.2909739017486572 seconds Sorting communities by size, please wait ... PhenoGraph completed in 3.943619966506958 seconds
palantir.plot.plot_gene_trend_clusters(trends, gene_clusters)
findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans.
Here is a script to produce the plot as above, with averaged expression of each gene cluster coloured by cell types, together with confidence bands. It takes some time to do all the plots, so be patient.
GENE_CLST = np.array(gene_clusters)
UNIQUE_CLST = np.sort(np.unique(GENE_CLST))
CLST_NR = int(len(UNIQUE_CLST))
CLUSTERS = sample.obs['clusters_spc']
PSEUDOTIMES = sample.obs['pseudotime']
plt.rcParams['figure.figsize']=(12,4*CLST_NR)
fig, ax = plt.subplots(CLST_NR,1)
c = CLUSTERS
x = PSEUDOTIMES
if(CLST_NR==1):
t = []
x2 = []
ARRAY = np.array( trends.columns )
for i in ARRAY:
idx = np.argmin(np.abs(x - i))
x2.append(c[idx])
t.append(i)
x=np.tile(ARRAY,trends.loc[GENE_CLST==0,:].shape[0])
y=np.array(trends.loc[GENE_CLST==0,:]).ravel()
hue=np.tile(x2,trends.loc[GENE_CLST==0,:].shape[0])
ax = sns.lineplot(x=x, y=y, hue=hue)
sns.lineplot(x=x, y=y, hue=hue,
ax=ax, linewidth = 5)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
if(CLST_NR>1):
ARRAY = np.array( trends.columns )
t = []
x2 = []
for i in ARRAY:
idx = np.argmin(np.abs(x - i))
x2.append(c[idx])
t.append(i)
for CLST_NR in UNIQUE_CLST:
x=np.tile(ARRAY,trends.loc[GENE_CLST==CLST_NR,:].shape[0])
y=np.array(trends.loc[GENE_CLST==CLST_NR,:]).ravel()
hue=np.tile(x2,trends.loc[GENE_CLST==CLST_NR,:].shape[0])
sns.lineplot(x=x, y=y, hue=hue,
ax=ax[CLST_NR], linewidth = 5, legend=CLST_NR==0)
ax[CLST_NR].set(ylabel = f'Cluster {CLST_NR}')
if(CLST_NR==0):
ax[CLST_NR].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
ax[CLST_NR].set_title('Gene expression clustering for cell fate 0')
ax[CLST_NR].set(xlabel = 'Pseudotime')
plt.rcParams['figure.figsize']=(6,6)
you can always look at the genes in a specific cluster. In this case, each cluster should be quite matching the differentially expressed genes for a cell type, since we grouped together differentially expressed genes
gene_clusters[gene_clusters==5]
AC010255.3 5 C10orf62 5 DCUN1D1 5 FNDC8 5 GAPDHS 5 GLUL 5 HMGB4 5 IQCF1 5 LELP1 5 LINC01921 5 OAZ3 5 ODF2 5 P3R3URF 5 PRM1 5 PRM2 5 SPATA3 5 TEX37 5 TEX44 5 TNP1 5 TSSK6 5 dtype: int64
We also want to save the dataset (including somatic cells) with pseudotimes. To do this we reopen the whole dataset and assign pseudotimes equal to 0 to the somatic cell.
whole_sample = sc.read('../../Data/notebooks_data/sample_123.filt.norm.red.clst.2.h5ad')
WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.h5ad']. Only considering the two last: ['.2', '.h5ad']. WARNING: Your filename has more than two extensions: ['.filt', '.norm', '.red', '.clst', '.2', '.h5ad']. Only considering the two last: ['.2', '.h5ad'].
times = pd.Series(sample.obs['pseudotime'], index=sample.obs_names)
whole_times = pd.Series(index=whole_sample.obs_names)
names = sample.obs_names
whole_names = whole_sample.obs_names
whole_times = [ times[i] if i in names else 0 for i in whole_names ]
whole_sample.obs['pseudotimes'] = whole_times
whole_sample.write('../../Data/notebooks_data/sample_123.filt.norm.red.clst.2.times.h5ad')
Wrapping up¶
This notebooks shows how to do pseudotimes analysis and exploring cell fates and gene expressions. We have seen how to distinguish between an actual differentiation branch and a differentiation stage. Basically, all cells before (i.e. earlier in pseudotime) a differentiation stage will be associated to such stage with high probability, because they must go through that developmental stage. Finding a developmental stage around meiosis in spermatogenic samples is a common results across single cell datasets of many species (primates, humans, mice). Using the palantir
software, we can look at differences between gene expressions for different fates, and cluster together genes of interest for further analysis.