# Comparing two groups: t-test

```# R from zero to hero # Marco Plebani 18 May 2018```

Here we see how to perform a t-test “by hand”. Now let’s have R do the hard work. Let’s start with entering the data. You could enter them in Excel, save them as a csv file and then open them in R using `read.csv("FILE DIRECTORY HERE/twogroups.csv")`.
In this case it’s a handful of data so I’m doing it by hand as follows:

```# Group 1 g1 <- c(6,7,7.2,8,9) # Group 2 g2 <- c(1.5,2.5,2.6,5,5.5)```

```# build a data.frame so that all data are packed together: # (here I am introducing function data.frame, which creates dataframes, and function rep, used to REPlicate values. For further details just type ?rep into the terminal) dd <- data.frame(groups = c(rep("g1",5),rep("g2",5)), values = c(g1,g2) )```

ALWAYS PLOT YOUR DATA! Get a visual understanding of the data and patterns (or lack thereof) before running analyses.

```boxplot(values~groups, data=dd) # boxplot() creates a box-and-whiskers plot. # I could have just used function plot(). When plotting data versus a categorical variable, R uses a box-and-whiskers plot by default.```

– The thick line is not the mean but the median
– the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
– the ends of the “whiskers” delimit the 95% confidence interval.

```# there's a function for t tests: t.test() t.test(values~groups, data=dd) # Welch's t-test t.test(values~groups, data=dd, var.equal=TRUE) # Student's t-test # spot the differences!```

IMPORTANT! Whether it’s a t test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly, a model that tests how our response variable varies in relationship with changes in the explanatory variables.
So for all these analyses we can just specify the model using function `lm()`, and then test/examine it using functions `anova()` and `summary()`

```test1 <- lm(values~groups, data=dd) anova(test1) # anova() tests whether any model is any good - not only ANOVAs. summary(test1) ```

We’ll see how to interpret this together, but if you are impatient you can read up about it on a book. One I like is this.

```############### #### BONUS #### ###############```

```# other ways of plotting the same data: hist(g1, freq=F, xlim=c(0,10), col="grey", breaks=c((1:20)/2)) hist(g2, freq=F, add=T, col=grey(0.3), breaks=c((1:20)/2))```
```plot(density(g1, bw = "sj"), xlim=c(0,11), main="Density distributions" ) lines(density(g2, bw = "sj"), # adjust=... is an extra argument for density() lty="dashed" ) # note how I used the ash symbol to make a piece of code "silent". # I find this useful when I am fiddling with code, to silence sections of it without deleting them.```

I was a bit lazy, so legends are missing. I show how to create them here.