<- data.frame(row.names = paste0("sample", c(1:12)),
metadata condition = rep(c(rep("control",3), rep("treat",3)), 2),
bloodtype = c(rep("bloodO",6), rep("bloodA",6)),
patient = rep(paste0("patient",c(1:3)), 4)
)
metadata
DESeq samples comparison: contrast designs
Approximate time: 40 minutes
Learning Objectives
- Demonstrate the use of the design formula with simple and complex designs
- Construct R code to execute the differential expression analysis workflow with DESeq2
Design formula
A design formula tells the statistical software the known sources of variation to control for, as well as, the factor of interest to test for during differential expression testing. For example, if you know that sex is a significant source of variation in your data, then sex
should be included in your model. The design formula should have all of the factors in your metadata that account for major sources of variation in your data. The last factor entered in the formula should be the condition of interest.
For example, suppose you have the following metadata:
If you want to examine the expression differences between condition
, and you know that major sources of variation include bloodtype
and patient
, then your design formula would be:
design = ~ bloodtype + patient + condition
The tilde (~
) should always precede your factors and tells DESeq2 to model the counts using the following formula. Note the factors included in the design formula need to match the column names in the metadata.
In this tutorial we show a general and flexible way to define contrasts, and is often useful for more complex contrasts or when the design of the experiment is imbalanced (e.g. different number of replicates in each group). Although we focus on DESeq2, the approach can also be used with the other popular package edgeR.
Each section below covers a particular experimental design, from simpler to more complex ones. The first chunk of code in each section is to simulate data, which has no particular meaning and is only done in order to have a DESeqDataSet object with the right kind of variables for each example. In practice, users can ignore this step as they should have created a DESeqDataSet object from their own data following the instructions in the vignette.
One factor, two levels
# simulate data
<- makeExampleDESeqDataSet(n = 1000, m = 6, betaSD = 2)
dds $condition <- factor(rep(c("control", "treat"), each = 3)) dds
First we can look at our sample information:
colData(dds)
DataFrame with 6 rows and 1 column
condition
<factor>
sample1 control
sample2 control
sample3 control
sample4 treat
sample5 treat
sample6 treat
Our factor of interest is condition
and so we define our design and run the DESeq model fitting routine:
design(dds) <- ~ 1 + condition # or just `~ condition`
<- DESeq(dds) # equivalent to edgeR::glmFit() dds
Then check what coefficients DESeq estimated:
resultsNames(dds)
[1] "Intercept" "condition_treat_vs_control"
We can see that we have a coefficient for our intercept
and coefficient for the effect of treat
(i.e. differences between treat versus control).
Using the more standard syntax, we can obtain the results for the effect of treat as such:
<- results(dds, contrast = list("condition_treat_vs_control"))
res1 res1
log2 fold change (MLE): condition_treat_vs_control effect
Wald test p-value: condition_treat_vs_control effect
DataFrame with 1000 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
gene1 6.99478 0.827958 1.164165 0.711203 4.76958e-01 0.602507693
gene2 111.94803 1.425660 0.461628 3.088334 2.01282e-03 0.007743544
gene3 11.69847 -0.625382 1.120365 -0.558195 5.76711e-01 0.687474997
gene4 61.31487 -3.146315 0.760285 -4.138335 3.49836e-05 0.000234371
gene5 2.37825 -3.811160 2.626004 -1.451315 1.46692e-01 0.260588553
... ... ... ... ... ... ...
gene996 2.25406 0.1704427 2.115898 0.0805534 9.35797e-01 9.59501e-01
gene997 288.94559 0.0622391 0.394597 0.1577286 8.74671e-01 9.21175e-01
gene998 895.65622 3.1721701 0.416109 7.6234095 2.47062e-14 7.93153e-13
gene999 35.35022 0.5104178 0.576830 0.8848664 3.76229e-01 5.02538e-01
gene1000 4.37311 -2.6796029 1.988946 -1.3472474 1.77901e-01 2.96820e-01
The above is a simple way to obtain the results of interest. But it is worth understanding how DESeq is getting to these results by looking at the model’s matrix. DESeq defines the model matrix using base R functionality:
model.matrix(design(dds), colData(dds))
(Intercept) conditiontreat
sample1 1 0
sample2 1 0
sample3 1 0
sample4 1 1
sample5 1 1
sample6 1 1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"
We can see that R coded condition
as a dummy variable, with an intercept (common to all samples) and a “conditiontreat” variable, which adds the effect of treat to samples 4-6.
We can actually set our contrasts in DESeq2::results()
using a numeric vector. The way it works is to define a vector of “weights” for the coefficient(s) we want to test for. In this case, we have (Intercept)
and conditiontreat
as our coefficients (see model matrix above), and we want to test for the effect of treat, so our contrast vector would be c(0, 1)
. In other words, we don’t care about the value of (Intercept)
(so it has a weight of 0), and we’re only interested in the effect of treat (so we give it a weight of 1).
In this case the design is very simple, so we could define our contrast vector “manually”. But for complex designs this can get more difficult to do, so it’s worth mentioning the general way in which we can define this. For any contrast of interest, we can follow three steps:
- Get the model matrix
- Subset the matrix for each group of interest and calculate its column means - this results in a vector of coefficients for each group
- Subtract the group vectors from each other according to the comparison we’re interested in
Let’s see this example in action:
# get the model matrix
<- model.matrix(design(dds), colData(dds))
mod_mat mod_mat
(Intercept) conditiontreat
sample1 1 0
sample2 1 0
sample3 1 0
sample4 1 1
sample5 1 1
sample6 1 1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"
# calculate the vector of coefficient weights in the treat
<- colMeans(mod_mat[dds$condition == "treat", ])
treat treat
(Intercept) conditiontreat
1 1
# calculate the vector of coefficient weights in the control
<- colMeans(mod_mat[dds$condition == "control", ])
control control
(Intercept) conditiontreat
1 0
# The contrast we are interested in is the difference between treat and control
- control treat
(Intercept) conditiontreat
0 1
That last step is where we define our contrast vector, and we can give this directly to the results
function:
# get the results for this contrast
<- results(dds, contrast = treat - control) res2
This gives us exactly the same results as before, which we can check for example by plotting the log-fold-changes between the first and second approach:
plot(res1$log2FoldChange, res2$log2FoldChange)
Recoding the design
Often, we can use different model matrices that essentially correspond to the same design. For example, we could recode our design above by removing the intercept:
design(dds) <- ~ 0 + condition
<- DESeq(dds)
dds resultsNames(dds)
[1] "conditioncontrol" "conditiontreat"
In this case we get a coefficient corresponding to the average expression in control and the average expression in the treat (rather than the difference between treat and control).
If we use the same contrast trick as before (using the model matrix), we can see the result is the same:
# get the model matrix
<- model.matrix(design(dds), colData(dds))
mod_mat mod_mat
conditioncontrol conditiontreat
sample1 1 0
sample2 1 0
sample3 1 0
sample4 0 1
sample5 0 1
sample6 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"
# calculate weights for coefficients in each condition
<- colMeans(mod_mat[which(dds$condition == "treat"), ])
treat <- colMeans(mod_mat[which(dds$condition == "control"), ])
control # get the results for our contrast
<- results(dds, contrast = treat - control) res3
Again, the results are essentially the same:
plot(res1$log2FoldChange, res3$log2FoldChange)
In theory there’s no difference between these two ways of defining our design. The design with an intercept is more common, but for the purposes of understanding what’s going on, it’s sometimes easier to look at models without intercept.
One factor, three levels
# simulate data
<- makeExampleDESeqDataSet(n = 1000, m = 9, betaSD = 2)
dds $condition <- NULL
dds$bloodtype <- factor(rep(c("bloodA", "bloodB", "bloodO"), each = 3))
dds$bloodtype <- relevel(dds$bloodtype, "bloodO") dds
First we can look at our sample information:
colData(dds)
DataFrame with 9 rows and 1 column
bloodtype
<factor>
sample1 bloodA
sample2 bloodA
sample3 bloodA
sample4 bloodB
sample5 bloodB
sample6 bloodB
sample7 bloodO
sample8 bloodO
sample9 bloodO
As in the previous example, we only have one factor of interest, bloodtype
, and so we define our design and run the DESeq as before:
design(dds) <- ~ 1 + bloodtype
<- DESeq(dds)
dds # check the coefficients estimated by DEseq
resultsNames(dds)
[1] "Intercept" "bloodtype_bloodA_vs_bloodO"
[3] "bloodtype_bloodB_vs_bloodO"
We see that now we have 3 coefficients:
- “Intercept” corresponds to bloodO bloodtype (our reference level)
- “bloodtype_bloodA_vs_bloodO” corresponds to the difference between the reference level and bloodA
- “bloodtype_bloodB_vs_bloodO” corresponds to the difference between the reference level and bloodB
We could obtain the difference between bloodO and any of the two bloodtypes easily:
<- results(dds, contrast = list("bloodtype_bloodA_vs_bloodO"))
res1_bloodA_bloodO <- results(dds, contrast = list("bloodtype_bloodB_vs_bloodO")) res1_bloodB_bloodO
For comparing bloodA vs bloodB, however, we need to compare two coefficients with each other to check whether they are themselves different (check the slide to see the illustration). This is how the standard DESeq syntax would be:
<- results(dds, contrast = list("bloodtype_bloodA_vs_bloodO",
res1_bloodA_bloodB "bloodtype_bloodB_vs_bloodO"))
However, following our three steps detailed in the first section, we can define our comparisons from the design matrix:
# define the model matrix
<- model.matrix(design(dds), colData(dds))
mod_mat mod_mat
(Intercept) bloodtypebloodA bloodtypebloodB
sample1 1 1 0
sample2 1 1 0
sample3 1 1 0
sample4 1 0 1
sample5 1 0 1
sample6 1 0 1
sample7 1 0 0
sample8 1 0 0
sample9 1 0 0
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$bloodtype
[1] "contr.treatment"
# calculate coefficient vectors for each group
<- colMeans(mod_mat[dds$bloodtype == "bloodA", ])
bloodA <- colMeans(mod_mat[dds$bloodtype == "bloodB", ])
bloodB <- colMeans(mod_mat[dds$bloodtype == "bloodO", ]) bloodO
And we can now define any contrasts we want:
# obtain results for each pairwise contrast
<- results(dds, contrast = bloodA - bloodO)
res2_bloodA_bloodO <- results(dds, contrast = bloodB - bloodO)
res2_bloodB_bloodO <- results(dds, contrast = bloodA - bloodB)
res2_bloodA_bloodB # plot the results from the two approaches to check that they are identical
plot(res1_bloodA_bloodO$log2FoldChange, res2_bloodA_bloodO$log2FoldChange)
plot(res1_bloodB_bloodO$log2FoldChange, res2_bloodB_bloodO$log2FoldChange)
plot(res1_bloodA_bloodB$log2FoldChange, res2_bloodA_bloodB$log2FoldChange)
A and B against O
With this approach, we could even define a more unusual contrast, for example to find genes that differ between A and B against and O samples:
# define vector of coefficients for A_B samples
<- colMeans(mod_mat[dds$bloodtype %in% c("bloodA", "bloodB"),])
A_B # Our contrast of interest is
- bloodO A_B
(Intercept) bloodtypebloodA bloodtypebloodB
0.0 0.5 0.5
Notice the contrast vector in this case assigns a “weight” of 0.5 to each of bloodtypebloodA
and bloodtypebloodB
. This is equivalent to saying that we want to consider the average of bloodA and bloodB expression. In fact, we could have also defined our contrast vector like this:
# average of bloodA and bloodB minus bloodO
+ bloodB)/2 - bloodO (bloodA
(Intercept) bloodtypebloodA bloodtypebloodB
0.0 0.5 0.5
To obtain our results, we use the results()
function as before:
# get the results between A_B and bloodA
<- results(dds, contrast = A_B - bloodO) res2_AB
Extra: why not define a new group in our design matrix?
For this last example (A_B vs bloodO), we may have considered creating a new variable in our column data:
$A_B <- factor(dds$bloodtype %in% c("bloodA", "bloodB"))
ddscolData(dds)
DataFrame with 9 rows and 3 columns
bloodtype sizeFactor A_B
<factor> <numeric> <factor>
sample1 bloodA 0.964674 TRUE
sample2 bloodA 0.917232 TRUE
sample3 bloodA 0.895324 TRUE
sample4 bloodB 0.995566 TRUE
sample5 bloodB 0.921173 TRUE
sample6 bloodB 1.182643 TRUE
sample7 bloodO 1.141209 FALSE
sample8 bloodO 1.106591 FALSE
sample9 bloodO 1.156398 FALSE
and then re-run DESeq with a new design:
design(dds) <- ~ 1 + A_B
<- DESeq(dds)
dds resultsNames(dds)
[1] "Intercept" "A_B_TRUE_vs_FALSE"
<- results(dds, contrast = list("A_B_TRUE_vs_FALSE")) res1_A_B
However, in this model the gene dispersion is estimated together for bloodA and bloodB samples as if they were replicates of each other, which may result in inflated/deflated estimates. Instead, our approach above estimates the error within each of those groups.
To check the difference one could compare the two approaches visually:
# compare the log-fold-changes between the two approaches
plot(res1_A_B$log2FoldChange, res2_AB$log2FoldChange)
abline(0, 1, col = "brown", lwd = 2)
# compare the errors between the two approaches
plot(res1_A_B$lfcSE, res2_AB$lfcSE)
abline(0, 1, col = "brown", lwd = 2)
Two factors with interaction
# simulate data
<- makeExampleDESeqDataSet(n = 1000, m = 12, betaSD = 2)
dds $bloodtype <- factor(rep(c("bloodO", "bloodA"), each = 6))
dds$bloodtype <- relevel(dds$bloodtype, "bloodO")
dds$condition <- factor(rep(c("treat", "control"), 6))
dds<- dds[, order(dds$bloodtype, dds$condition)]
dds colnames(dds) <- paste0("sample", 1:ncol(dds))
First let’s look at our sample information:
colData(dds)
DataFrame with 12 rows and 2 columns
condition bloodtype
<factor> <factor>
sample1 control bloodO
sample2 control bloodO
sample3 control bloodO
sample4 treat bloodO
sample5 treat bloodO
... ... ...
sample8 control bloodA
sample9 control bloodA
sample10 treat bloodA
sample11 treat bloodA
sample12 treat bloodA
This time we have two factors of interest, and we want to model both with an interaction (i.e. we assume that bloodA and bloodO samples may respond differently to treat/control). We define our design accordingly and fit the model:
design(dds) <- ~ 1 + bloodtype + condition + bloodtype:condition
<- DESeq(dds)
dds resultsNames(dds)
[1] "Intercept" "bloodtype_bloodA_vs_bloodO"
[3] "condition_treat_vs_control" "bloodtypebloodA.conditiontreat"
Because we have two factors and an interaction, the number of comparisons we can do is larger. Using our three-step approach from the model matrix, we do things exactly as we’ve been doing so far:
# get the model matrix
<- model.matrix(design(dds), colData(dds))
mod_mat # Define coefficient vectors for each condition
<- colMeans(mod_mat[dds$bloodtype == "bloodO" & dds$condition == "control", ])
bloodO_control <- colMeans(mod_mat[dds$bloodtype == "bloodO" & dds$condition == "treat", ])
bloodO_treat <- colMeans(mod_mat[dds$bloodtype == "bloodA" & dds$condition == "control", ])
bloodA_control <- colMeans(mod_mat[dds$bloodtype == "bloodA" & dds$condition == "treat", ]) bloodA_treat
We are now ready to define any contrast of interest from these vectors (for completeness we show the equivalent syntax using the coefficient’s names from DESeq).
bloodA vs bloodO (in the control):
<- results(dds, contrast = bloodA_control - bloodO_control)
res1 # or equivalently
<- results(dds, contrast = list("bloodtype_bloodA_vs_bloodO")) res2
bloodA vs bloodO (in the treatment):
<- results(dds, contrast = bloodO_treat - bloodA_treat)
res1 # or equivalently
<- results(dds, contrast = list(c("bloodtype_bloodA_vs_bloodO",
res2 "bloodtypebloodA.conditiontreat")))
treat vs control (for bloodtypes O):
<- results(dds, contrast = bloodO_treat - bloodO_control)
res1 # or equivalently
<- results(dds, contrast = list(c("condition_treat_vs_control"))) res2
treat vs control (for bloodtypes A):
<- results(dds, contrast = bloodA_treat - bloodA_control)
res1 # or equivalently
<- results(dds, contrast = list(c("condition_treat_vs_control",
res2 "bloodtypebloodA.conditiontreat")))
Interaction between bloodtype and condition
I.e. do bloodAs and bloodOs respond differently to the treatment?
<- results(dds,
res1 contrast = (bloodA_treat - bloodA_control) - (bloodO_treat - bloodO_control))
# or equivalently
<- results(dds, contrast = list("bloodtypebloodA.conditiontreat")) res2
In conclusion, although we can define these contrasts using DESeq coefficient names, it is somewhat more explicit (and perhaps intuitive?) what it is we’re comparing using matrix-based contrasts.
Three factors, with nesting
# simulate data
<- makeExampleDESeqDataSet(n = 1000, m = 24, betaSD = 2)
dds $bloodtype <- factor(rep(c("bloodA", "bloodO"), each = 12))
dds$bloodtype <- relevel(dds$bloodtype, "bloodO")
dds$patient <- factor(rep(LETTERS[1:4], each = 6))
dds$condition <- factor(rep(c("treat", "control"), 12))
dds<- dds[, order(dds$bloodtype, dds$patient, dds$condition)]
dds colnames(dds) <- paste0("sample", 1:ncol(dds))
First let’s look at our sample information:
colData(dds)
DataFrame with 24 rows and 3 columns
condition bloodtype patient
<factor> <factor> <factor>
sample1 control bloodO C
sample2 control bloodO C
sample3 control bloodO C
sample4 treat bloodO C
sample5 treat bloodO C
... ... ... ...
sample20 control bloodA B
sample21 control bloodA B
sample22 treat bloodA B
sample23 treat bloodA B
sample24 treat bloodA B
Now we have three factors, but patient is nested within bloodtype (i.e. a patient is either bloodA or bloodO, it cannot be both). Therefore, bloodtype is a linear combination with patient (or, another way to think about it is that bloodtype is redundant with patient). Because of this, we will define our design without including “bloodtype”, although later we can compare groups of patient of the same bloodtype with each other.
design(dds) <- ~ 1 + patient + condition + patient:condition
<- DESeq(dds)
dds resultsNames(dds)
[1] "Intercept" "patient_B_vs_A"
[3] "patient_C_vs_A" "patient_D_vs_A"
[5] "condition_treat_vs_control" "patientB.conditiontreat"
[7] "patientC.conditiontreat" "patientD.conditiontreat"
Now it’s harder to define contrasts between groups of patient of the same bloodtype using DESeq’s coefficient names (although still possible). But using the model matrix approach, we do it in exactly the same way we have done so far!
Again, let’s define our groups from the model matrix:
# get the model matrix
<- model.matrix(design(dds), colData(dds))
mod_mat # define coefficient vectors for each group
<- colMeans(mod_mat[dds$bloodtype == "bloodO" & dds$condition == "control", ])
bloodO_control <- colMeans(mod_mat[dds$bloodtype == "bloodA" & dds$condition == "control", ])
bloodA_control <- colMeans(mod_mat[dds$bloodtype == "bloodO" & dds$condition == "treat", ])
bloodO_treat <- colMeans(mod_mat[dds$bloodtype == "bloodA" & dds$condition == "treat", ]) bloodA_treat
It’s worth looking at some of these vectors, to see that they are composed of weighted coefficients from different patient. For example, for “bloodO” patient, we have equal contribution from “patientC” and “patientD”:
bloodO_control
(Intercept) patientB patientC
1.0 0.0 0.5
patientD conditiontreat patientB:conditiontreat
0.5 0.0 0.0
patientC:conditiontreat patientD:conditiontreat
0.0 0.0
And so, when we define our contrasts, each patient will be correctly weighted:
- bloodO_control bloodO_treat
(Intercept) patientB patientC
0.0 0.0 0.0
patientD conditiontreat patientB:conditiontreat
0.0 1.0 0.0
patientC:conditiontreat patientD:conditiontreat
0.5 0.5
We can set our contrasts in exactly the same way as we did in the previous section (for completeness, we also give the contrasts using DESeq’s named coefficients).
bloodA vs bloodO (in the control):
<- results(dds, contrast = bloodA_control - bloodO_control)
res1_bloodA_bloodO_control # or equivalently
<- results(dds,
res2_bloodA_bloodO_control contrast = list(c("patient_B_vs_A"), # Blood type A
c("patient_C_vs_A", # Blood type O
"patient_D_vs_A"))) # Blood type O
bloodA vs bloodO (in the treat):
<- results(dds, contrast = bloodO_treat - bloodA_treat)
res1_bloodO_bloodA_treat # or equivalently
<- results(dds,
res2_bloodO_bloodA_treat contrast = list(c("patient_B_vs_A", # Blood type A
"patientB.conditiontreat"), # Interaction of patient B with treatment
c("patient_C_vs_A", # Blood type O
"patient_D_vs_A", # Blood type O
"patientC.conditiontreat", # Interaction of patient C with treatment
"patientD.conditiontreat"))) # Interaction of patient B with treatment
And so on, for other contrasts of interest…
Extra: imbalanced design
Let’s take our previous example, but drop one of the samples from the data, so that we only have 2 replicates for it.
<- dds[, -1] # drop one of the patient C samples
dds <- DESeq(dds)
dds resultsNames(dds)
[1] "Intercept" "patient_B_vs_A"
[3] "patient_C_vs_A" "patient_D_vs_A"
[5] "condition_treat_vs_control" "patientB.conditiontreat"
[7] "patientC.conditiontreat" "patientD.conditiontreat"
Define our model matrix and coefficient vectors:
<- model.matrix(design(dds), colData(dds))
mod_mat mod_mat
(Intercept) patientB patientC patientD conditiontreat
sample2 1 0 1 0 0
sample3 1 0 1 0 0
sample4 1 0 1 0 1
sample5 1 0 1 0 1
sample6 1 0 1 0 1
sample7 1 0 0 1 0
sample8 1 0 0 1 0
sample9 1 0 0 1 0
sample10 1 0 0 1 1
sample11 1 0 0 1 1
sample12 1 0 0 1 1
sample13 1 0 0 0 0
sample14 1 0 0 0 0
sample15 1 0 0 0 0
sample16 1 0 0 0 1
sample17 1 0 0 0 1
sample18 1 0 0 0 1
sample19 1 1 0 0 0
sample20 1 1 0 0 0
sample21 1 1 0 0 0
sample22 1 1 0 0 1
sample23 1 1 0 0 1
sample24 1 1 0 0 1
patientB:conditiontreat patientC:conditiontreat
sample2 0 0
sample3 0 0
sample4 0 1
sample5 0 1
sample6 0 1
sample7 0 0
sample8 0 0
sample9 0 0
sample10 0 0
sample11 0 0
sample12 0 0
sample13 0 0
sample14 0 0
sample15 0 0
sample16 0 0
sample17 0 0
sample18 0 0
sample19 0 0
sample20 0 0
sample21 0 0
sample22 1 0
sample23 1 0
sample24 1 0
patientD:conditiontreat
sample2 0
sample3 0
sample4 0
sample5 0
sample6 0
sample7 0
sample8 0
sample9 0
sample10 1
sample11 1
sample12 1
sample13 0
sample14 0
sample15 0
sample16 0
sample17 0
sample18 0
sample19 0
sample20 0
sample21 0
sample22 0
sample23 0
sample24 0
attr(,"assign")
[1] 0 1 1 1 2 3 3 3
attr(,"contrasts")
attr(,"contrasts")$patient
[1] "contr.treatment"
attr(,"contrasts")$condition
[1] "contr.treatment"
# define coefficient vectors for each group
<- colMeans(mod_mat[dds$bloodtype == "bloodO" & dds$condition == "control", ])
bloodO_control <- colMeans(mod_mat[dds$bloodtype == "bloodA" & dds$condition == "control", ])
bloodA_control <- colMeans(mod_mat[dds$bloodtype == "bloodO" & dds$condition == "treat", ])
bloodO_treat <- colMeans(mod_mat[dds$bloodtype == "bloodA" & dds$condition == "treat", ]) bloodA_treat
Now let’s check what happens to the bloodO_control group:
bloodO_control
(Intercept) patientB patientC
1.0 0.0 0.4
patientD conditiontreat patientB:conditiontreat
0.6 0.0 0.0
patientC:conditiontreat patientD:conditiontreat
0.0 0.0
Notice that whereas before “patientC” and “patientD” had each a weight of 0.5, now they have different weights. That’s because for patientC there’s only 2 replicates. So, we have a total of 5 bloodtype O individuals in the control (2 from patient C and 3 from D). Therefore, when we calculate the average coefficients for bloodOs, we need to do it as 0.4 x patientC + 0.6 x patientD.
The nice thing about this approach is that we do not need to worry about any of this, the weights come from our colMeans()
call automatically. And now, any contrasts that we make will take these weights into account:
# bloodA vs bloodO (in the control)
- bloodO_control bloodA_control
(Intercept) patientB patientC
0.0 0.5 -0.4
patientD conditiontreat patientB:conditiontreat
-0.6 0.0 0.0
patientC:conditiontreat patientD:conditiontreat
0.0 0.0
# interaction
- bloodA_control) - (bloodO_treat - bloodO_control) (bloodA_treat
(Intercept) patientB patientC
0.0 0.0 -0.1
patientD conditiontreat patientB:conditiontreat
0.1 0.0 0.5
patientC:conditiontreat patientD:conditiontreat
-0.5 -0.5