2-way ANOVA

# R from zero to hero
# Marco Plebani 29 May 2018

rm(list=ls())

Often we are dealing with experiments or studies with more than one explanatory variable. In such situation, more than one treatment is applied simultaneously and we want to know what’s their relative effect on the response variable. If both explanatory variables are categorical, the analysis traditionally used is a Two-way ANOVA.

# Example: Carbon Dioxide Uptake in Grass Plants
data(CO2)

The CO2 data frame has 84 rows and 5 columns of data from an experiment on the cold tolerance of the grass species Echinochloa crus-galli.
The CO2 uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.

library(lattice) # quick and dirty way of displaying "complicated" data.
xyplot(uptake ~ Treatment, group=Type,
type = c("p","r"),
auto.key=T, # adds legend
data=CO2)
bwplot(uptake ~ Treatment | Type, data=CO2)

I use lattice for quick-and-dirty (and rather ugly) displays and plot() for publications.
A popular graphics package is ggplot2. I am not a big fan of its syntax (too verbose to remember easily) and its outputs, but I use it occasionally.
On the plus side:

# Define the models to compare:
CO2.lm.full <- lm(uptake ~ Treatment * Type, data=CO2)
CO2.lm.additive <- lm(uptake ~ Treatment + Type, data=CO2)
CO2.lm.null <- lm(uptake ~ 1, data=CO2)

anova(CO2.lm.additive, CO2.lm.null)
anova(CO2.lm.full, CO2.lm.additive)
anova(CO2.lm.additive, CO2.lm.full) # compare with the output of the line above
# summary(CO2.lm.full)
summary(CO2.lm.additive)

# by the way, there are other ways of comparing models:
AIC(CO2.lm.full, CO2.lm.additive, CO2.lm.null)

Function summary() provides mean differences between treatments as well as p-values from pairwise t-tests, but it does so only in comparison to a reference level (the one that comes first alpha-numerically). It is possible to produce all possible contrasts by changing the reference level by hand using function relevel(), but I find the following function way more practical:

pairwise.t.test(x = CO2$uptake, g = CO2$Treatment, p.adj="none")
# Oooops, it only handles one response variable at a time!
# easy fix:
CO2$treatment_combined <- paste(CO2$Treatment, CO2$Type, sep="_")
# check:
unique(CO2$treatment_combined)
pairwise.t.test(x = CO2$uptake, g = CO2$treatment_combined, p.adj="none")

Similarly, it’s possible to obtain mean and SD for each treatment combination from summary(), but the method shown below is much faster:

library(doBy)
msd <- summaryBy(uptake ~ Treatment * Type,
data=CO2,
FUN=c(mean, sd, length)
)

Now we can plot the data as a barplot:

mids <- barplot(msd$uptake.mean,
xlab="Treatment",
ylab="CO2 uptake",
ylim=c(0,50),
names.arg = msd$Treatment,
col=c(rep(c(grey(0.8), grey(0.3)),2))
)

# Add error bars using arrows():
# ?arrows
with(msd, arrows(mids, uptake.mean-uptake.sd, mids, uptake.mean + uptake.sd,
code=3, angle=90, length=0.1)) # length=0.1mm

legend("topright", title="Type",
legend=unique(msd$Type),
fill=c(grey(0.8), grey(0.3))
)

Useful links:
https://rcompanion.org/rcompanion/d_08.html
http://www.sthda.com/english/wiki/two-way-anova-test-in-r