# R from zero to hero
# Marco Plebani 18 May 2018
Here we see how to perform a t-test “by hand”. Now let’s have R do the hard work. Let’s start with entering the data. You could enter them in Excel, save them as a csv file and then open them in R using
read.csv("FILE DIRECTORY HERE/twogroups.csv")
.
In this case it’s a handful of data so I’m doing it by hand as follows:
# Group 1
g1 <- c(6,7,7.2,8,9)
# Group 2
g2 <- c(1.5,2.5,2.6,5,5.5)
# build a data.frame so that all data are packed together:
# (here I am introducing function data.frame, which creates dataframes,
and function rep, used to REPlicate values. For further details just
type ?rep into the terminal)
dd <- data.frame(groups = c(rep("g1",5),rep("g2",5)),
values = c(g1,g2)
)
ALWAYS PLOT YOUR DATA! Get a visual understanding of the data and patterns (or lack thereof) before running analyses.
boxplot(values~groups, data=dd)
# boxplot() creates a box-and-whiskers plot.
# I could have just used function plot(). When plotting data versus a
categorical variable, R uses a box-and-whiskers plot by default.
About box-and-whiskers plots in R:
– The thick line is not the mean but the median
– the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
– the ends of the “whiskers” delimit the 95% confidence interval.
# there's a function for t tests: t.test()
t.test(values~groups, data=dd) # Welch's t-test
t.test(values~groups, data=dd, var.equal=TRUE) # Student's t-test
# spot the differences!
IMPORTANT! Whether it’s a t test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly, a model that tests how our response variable varies in relationship with changes in the explanatory variables.
So for all these analyses we can just specify the model using functionlm()
, and then test/examine it using functionsanova()
andsummary()
test1 <- lm(values~groups, data=dd)
anova(test1) # anova() tests whether any model is any good - not only ANOVAs.
summary(test1)
We’ll see how to interpret this together, but if you are impatient you can read up about it on a book. One I like is this.
###############
#### BONUS ####
###############
# other ways of plotting the same data:
hist(g1, freq=F, xlim=c(0,10), col="grey", breaks=c((1:20)/2))
hist(g2, freq=F, add=T, col=grey(0.3), breaks=c((1:20)/2))
plot(density(g1, bw = "sj"),
xlim=c(0,11),
main="Density distributions"
)
lines(density(g2, bw = "sj"), # adjust=... is an extra argument for density()
lty="dashed"
)
# note how I used the ash symbol to make a piece of code "silent".
# I find this useful when I am fiddling with code, to silence sections of it without deleting them.
I was a bit lazy, so legends are missing. I show how to create them here.