Comparing two groups: t-test

# R from zero to hero
# Marco Plebani 18 May 2018

Here we see how to perform a t-test “by hand”. Now let’s have R do the hard work. Let’s start with entering the data. You could enter them in Excel, save them as a csv file and then open them in R using read.csv("FILE DIRECTORY HERE/twogroups.csv").
In this case it’s a handful of data so I’m doing it by hand as follows:

# Group 1
g1 <- c(6,7,7.2,8,9)
# Group 2
g2 <- c(1.5,2.5,2.6,5,5.5)

# build a data.frame so that all data are packed together:
# (here I am introducing function data.frame, which creates dataframes,
and function rep, used to REPlicate values. For further details just
type ?rep into the terminal)
dd <- data.frame(groups = c(rep("g1",5),rep("g2",5)),
values = c(g1,g2)
)

ALWAYS PLOT YOUR DATA! Get a visual understanding of the data and patterns (or lack thereof) before running analyses.

boxplot(values~groups, data=dd)
# boxplot() creates a box-and-whiskers plot.
# I could have just used function plot(). When plotting data versus a
categorical variable, R uses a box-and-whiskers plot by default.

About box-and-whiskers plots in R:
– The thick line is not the mean but the median
– the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
– the ends of the “whiskers” delimit the 95% confidence interval.

# there's a function for t tests: t.test()
t.test(values~groups, data=dd) # Welch's t-test
t.test(values~groups, data=dd, var.equal=TRUE) # Student's t-test
# spot the differences!

IMPORTANT! Whether it’s a t test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly, a model that tests how our response variable varies in relationship with changes in the explanatory variables.
So for all these analyses we can just specify the model using function lm(), and then test/examine it using functions anova() and summary()

test1 <- lm(values~groups, data=dd)
anova(test1) # anova() tests whether any model is any good - not only ANOVAs.
summary(test1)

We’ll see how to interpret this together, but if you are impatient you can read up about it on a book. One I like is this.

###############
#### BONUS ####
###############

# other ways of plotting the same data:
hist(g1, freq=F, xlim=c(0,10), col="grey", breaks=c((1:20)/2))
hist(g2, freq=F, add=T, col=grey(0.3), breaks=c((1:20)/2))

plot(density(g1, bw = "sj"),
xlim=c(0,11),
main="Density distributions"
)
lines(density(g2, bw = "sj"), # adjust=... is an extra argument for density()
lty="dashed"
)
# note how I used the ash symbol to make a piece of code "silent".
# I find this useful when I am fiddling with code, to silence sections of it without deleting them.

I was a bit lazy, so legends are missing. I show how to create them here.

 

Advertisements