Linear regression

# R from zero to hero
# Marco Plebani 21 May 2018

To assess correlation between continuos variables we can use linear regression.
Example: is the productivity of an economy correlated to employement rates?

Remember when I told you to avoid calling your data “data”?
That’s because it’s bad practice to create R objects with the same name of functions.
R has a bunch of datasets stored in its memory, mostly for educational purposes, and function data() is exactly what’s used to access those datasets.

data(longley) # data on economics and demographics
plot(GNP ~ Employed, data = longley)
# make it fancier:
par(mar=c(5,5,5,3)) # adjusts the margins
plot(GNP ~ Employed, data = longley,
xlab="% employed adults (I guess)",
ylab="Gross National Product (million $?)",
main="Money money money!"
# it looks like there is a correlation...
# ...And it looks linear, so it should obey the model:
# GNP = a + b * Employed
# how to find a and b?

# let R do it:
linmod1 <- lm(GNP ~ Employed, data = longley)
abline(linmod1, lty="solid")

R also gives us significance levels (SE, p-vals) and a bunch of other useful bits of information.
And it does so with TWO LINES OF CODE. Not too shabby!

For an example showing how the Least Square Method works, visit