`# R from zero to hero`

# Marco Plebani 18 May 2018

Here we see how to perform a t-test “by hand”. Now let’s have R do the hard work. Let’s start with entering the data. You could enter them in Excel, save them as a csv file and then open them in R using

`read.csv()`

In this case it’s a handful of data so I’m entering them by hand as follows:

```
# Group 1
g1 <- c(6,7,7.2,8,9)
# Group 2
g2 <- c(1.5,2.5,2.6,5,5.5)
```

Build a data.frame so that all data are packed together (here I am introducing function data.frame, which creates dataframes, and function rep, used to REPlicate values. For further details just type *?rep* into the terminal):

dd <- data.frame(groups = c(rep("g1",5),rep("g2",5)), values = c(g1,g2) )

**ALWAYS PLOT YOUR DATA FIRST! Get a visual understanding of the data and patterns (or lack thereof) before running analyses.**

boxplot(values~groups, data=dd) # boxplot() creates a box-and-whiskers plot.

I could have just used function plot(). When plotting data versus a categorical variable, R uses a box-and-whiskers plot by default.

About box-and-whiskers plots in R:

- The thick line is not the mean but the median
- the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
- the ends of the “whiskers” delimit the 95% confidence interval.

# there's a function for t tests: t.test() t.test(values~groups, data=dd) # Welch's t-test t.test(values~groups, data=dd, var.equal=TRUE) # Student's t-test # spot the differences!

**IMPORTANT! **Whether it’s a t test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly said, it is a model that tests how our response variable varies in relationship with changes in the explanatory variables.

So for all these analyses we can just specify the model using function ** lm()**, and then test/examine it using functions

**and**

*anova()***.**

*summary()*test1 <- lm(values~groups, data=dd) anova(test1) # anova() tests whether any model is any good - not only ANOVAs. summary(test1)

We’ll see how to interpret this together, but if you are impatient you can read up about it on a book. **This** is one I like.

############### #### BONUS #### ############### # other ways of plotting the same data: hist(g1, freq=F, xlim=c(0,10), col="grey", breaks=c((1:20)/2)) hist(g2, freq=F, add=T, col=grey(0.3), breaks=c((1:20)/2)) plot(density(g1, bw = "sj"), xlim=c(0,11), main="Density distributions") lines(density(g2, bw = "sj"), col="red") # adjust=... is an extra argument for density() lty="dashed" ) # note how I used the ash symbol to make a piece of code "silent". # I find this useful when I am fiddling with code, to silence sections of it without deleting them.

I was a bit lazy, so legends are missing from the graphs. Here I show how to create them.