Linear regression

# R from zero to hero
# Marco Plebani 21 May 2018
rm(list=ls())

To assess correlation between continuos variables we can use linear regression.
Example: is the productivity of an economy correlated to employement rates?

Remember when I told you to avoid calling your data “data”?
That’s because it’s bad practice to create R objects with the same name of functions.
R has a bunch of datasets stored in its memory, mostly for educational purposes, and function data() is exactly what’s used to access those datasets.

data(longley) # data on economics and demographics
plot(GNP ~ Employed, data = longley)
# make it fancier:
par(mar=c(5,5,5,3)) # adjusts the margins
plot(GNP ~ Employed, data = longley,
xlab="% employed adults (I guess)",
ylab="Gross National Product (million $?)",
main="Money money money!"
)
# it looks like there is a correlation...
# ...And it looks linear, so it should obey the model:
# GNP = a + b * Employed
# how to find a and b?

# let R do it:
linmod1 <- lm(GNP ~ Employed, data = longley)
anova(linmod1)
summary(linmod1)
abline(linmod1, lty="solid")

R also gives us significance levels (SE, p-vals) and a bunch of other useful bits of information.
And it does so with TWO LINES OF CODE. Not too shabby!

For an example showing how the Least Square Method works, visit https://stackoverflow.com/questions/50508425/home-brew-implementation-of-a-least-squares-method-in-r-showing-unexpected-behav?sfb=2

 

Advertisements