`# R from zero to hero`

# Marco Plebani 18 May 2018

Here we see how to perform a t-test “by hand”. Now let’s have R do the hard work. Let’s start with entering the data. You could enter them in Excel, save them as a csv file and then open them in R using

`read.csv("FILE DIRECTORY HERE/twogroups.csv")`

.

In this case it’s a handful of data so I’m doing it by hand as follows:

`# Group 1`

g1 <- c(6,7,7.2,8,9)

# Group 2

g2 <- c(1.5,2.5,2.6,5,5.5)

`# build a data.frame so that all data are packed together:`

# (here I am introducing function data.frame, which creates dataframes,

and function rep, used to REPlicate values. For further details just

type ?rep into the terminal)

dd <- data.frame(groups = c(rep("g1",5),rep("g2",5)),

values = c(g1,g2)

)

ALWAYS PLOT YOUR DATA! Get a visual understanding of the data and patterns (or lack thereof)beforerunning analyses.

`boxplot(values~groups, data=dd)`

# boxplot() creates a box-and-whiskers plot.

# I could have just used function plot(). When plotting data versus a

categorical variable, R uses a box-and-whiskers plot by default.

About box-and-whiskers plots in R:

– The thick line is not the mean but the median

– the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values

– the ends of the “whiskers” delimit the 95% confidence interval.

`# there's a function for t tests: t.test()`

t.test(values~groups, data=dd) # Welch's t-test

t.test(values~groups, data=dd, var.equal=TRUE) # Student's t-test

# spot the differences!

IMPORTANT!Whether it’s a t test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly, a model that tests how our response variable varies in relationship with changes in the explanatory variables.

So for all these analyses we can just specify the model using function`lm()`

, and then test/examine it using functions`anova()`

and`summary()`

`test1 <- lm(values~groups, data=dd)`

anova(test1) # anova() tests whether any model is any good - not only ANOVAs.

summary(test1)

We’ll see how to interpret this together, but if you are impatient you can read up about it on a book. One I like is this.

`###############`

#### BONUS ####

###############

`# other ways of plotting the same data:`

hist(g1, freq=F, xlim=c(0,10), col="grey", breaks=c((1:20)/2))

hist(g2, freq=F, add=T, col=grey(0.3), breaks=c((1:20)/2))

`plot(density(g1, bw = "sj"),`

xlim=c(0,11),

main="Density distributions"

)

lines(density(g2, bw = "sj"), # adjust=... is an extra argument for density()

lty="dashed"

)

# note how I used the ash symbol to make a piece of code "silent".

# I find this useful when I am fiddling with code, to silence sections of it without deleting them.

I was a bit lazy, so legends are missing. I show how to create them here.