Comparing more than two groups: 1-way ANOVA

# R from zero to hero
# Marco Plebani 18 May 2018

First, enter the data.

# group 1
g1 <- c(6, 7,7.2,8,9)
#Group 2:
g2 <- c(1.5,2.5,2.6,5,5.5)
#Group 3:
g3 <- c(1, 1.2, 2.3, 4, 5)
# build a data.frame so that all data are packed together:
# (here I am introducing function data.frame, which creates dataframes, and function rep, used to REPlicate values. For further details just type ?rep into the terminal)
dd <- data.frame(groups = c(rep("g1",5),rep("g2",5),rep("g3",5)),
values = c(g1,g2,g3)
)

NOTE: during the course I provided you with the same data as above, but instead of entering them manually I gave them to you as a tab-delimited file called “threegroups.txt” that we “fed” to R using function read.delim("FILE_DIRECTORY_HERE/threegroups.txt") or read.delim(file.choose()).

######### ALWAYS PLOT DATA FIRST! #########

plot(values~groups, data=dd)

A little secret: here R is looking at the data and seeing that we want to plot numerical values against a categorical variable (groups). Normally R will act dumb and wait for us to be super-specific, but in this case it’s making a little decision for us and it’s plotting the data as a box-and-whisker plot by default.
In other words, R is calling another function called boxplot() under the hood. for details, type ?boxplot in the terminal. So that you know:

  • The thick line is not the mean but the median
  • the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
  • the ends of the “whiskers” delimit the 95% confidence interval.

######### IMPORTANT! #########

Whether it’s a t-test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly, a model that tests how our response variable varies in relationship with changes in the explanatory variables.
So for all these analyses we can just specify the model using function lm(), and then test/examine it using functions anova() and summary().

test2 <- lm(values~groups, data=dd)
anova(test2) # anova() tests whether any model is any good - not only ANOVAs.
summary(test2)

You can find notes on how to interpret summary() on books such as this.
For how to use pairwise.t.test() and summaryBy(){doBy}, see my page on two-way ANOVA.

Advertisements