Previously, we created the DESeq2 object using the appropriate design formula.
Code
# DO NOT RUN# Create dds objectdds <-DESeqDataSetFromTximport(txi,colData = meta %>%column_to_rownames("sample"), design =~ condition)# Filter genes with 0 countskeep <-rowSums(counts(dds)) >0dds <- dds[keep,]
Then, to run the entire differential expression analysis workflow, we use a single call to the function DESeq().
Code
## Run analysisdds <-DESeq(dds)
And with that we completed the entire workflow for the differential gene expression analysis with DESeq2! The DESeq() function performs a default analysis through the following steps:
Estimation of size factors: estimateSizeFactors()
Estimation of dispersion: estimateDispersions()
Negative Binomial GLM fitting and Wald statistics: nbinomWaldTest()
Step 1: Estimate size factors
The first step in the differential expression analysis is to estimate the size factors, which is exactly what we already did to normalize the raw counts.
Let’s take a quick look at size factor values we have for each sample:
These numbers should be identical to those we generated initially when we had run the function estimateSizeFactors(dds). Take a look at the total number of reads for each sample:
Code
## Total number of raw counts per samplecolSums(counts(dds))
How do the numbers correlate with the size factor?
We see that the larger size factors correspond to the samples with higher sequencing depth, which makes sense, because to generate our normalized counts we need to divide the counts by the size factors. This accounts for the differences in sequencing depth between samples.
Now take a look at the total depth after normalization using:
Code
## Total number of normalized counts per samplecolSums(counts(dds, normalized=T))
How do the values across samples compare with the total counts taken for each sample?
You might have expected the counts to be the exact same across the samples after normalization. However, DESeq2 also accounts for RNA composition during the normalization procedure. By using the median ratio value for the size factor, DESeq2 should not be biased to a large number of counts sucked up by a few DE genes; however, this may lead to the size factors being quite different than what would be anticipated just based on sequencing depth.
Step 2: Estimate gene-wise dispersion
Let’s take a look at the dispersion estimates for our Vampirium data. First, we will use the function estimateDispersions().
Code
dds <-estimateDispersions(dds)
We can check the values using the dispersion() function and plotting it with the plotDispEsts() function: