Comparing two groups: t-test

# R from zero to hero
# Marco Plebani 18 May 2018

Here we see how to perform a t-test “by hand”. Now let’s have R do the hard work. Let’s start with entering the data. You could enter them in Excel, save them as a csv file and then open them in R using

read.csv()

In this case it’s a handful of data so I’m entering them by hand as follows:

# Group 1
g1 <- c(6,7,7.2,8,9)
# Group 2
g2 <- c(1.5,2.5,2.6,5,5.5)

Build a data.frame so that all data are packed together (here I am introducing function data.frame, which creates dataframes, and function rep, used to REPlicate values. For further details just type ?rep into the terminal):

dd <- data.frame(groups = c(rep("g1",5),rep("g2",5)),
values = c(g1,g2)
)

ALWAYS PLOT YOUR DATA FIRST! Get a visual understanding of the data and patterns (or lack thereof) before running analyses.

boxplot(values~groups, data=dd)
# boxplot() creates a box-and-whiskers plot.

I could have just used function plot(). When plotting data versus a categorical variable, R uses a box-and-whiskers plot by default.

About box-and-whiskers plots in R:

  • The thick line is not the mean but the median
  • the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
  • the ends of the “whiskers” delimit the 95% confidence interval.
# there's a function for t tests: t.test()
t.test(values~groups, data=dd) # Welch's t-test
t.test(values~groups, data=dd, var.equal=TRUE) # Student's t-test
# spot the differences!

IMPORTANT! Whether it’s a t test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly said, it is a model that tests how our response variable varies in relationship with changes in the explanatory variables.

So for all these analyses we can just specify the model using function lm(), and then test/examine it using functions anova() and summary().

test1 <- lm(values~groups, data=dd)
anova(test1) # anova() tests whether any model is any good - not only ANOVAs.
summary(test1)

We’ll see how to interpret this together, but if you are impatient you can read up about it on a book. This is one I like.

###############
#### BONUS ####
###############

# other ways of plotting the same data: 
hist(g1, freq=F, xlim=c(0,10), col="grey", breaks=c((1:20)/2))
hist(g2, freq=F, add=T, col=grey(0.3), breaks=c((1:20)/2))
plot(density(g1, bw = "sj"), xlim=c(0,11), main="Density distributions")
lines(density(g2, bw = "sj"), col="red")
# adjust=... is an extra argument for density() lty="dashed" ) 
# note how I used the ash symbol to make a piece of code "silent". 
# I find this useful when I am fiddling with code, to silence sections of it without deleting them.

I was a bit lazy, so legends are missing from the graphs. Here I show how to create them.