# R from zero to hero
# Marco Plebani 21 May 2018
rm(list=ls())
To assess correlation between continuos variables we can use linear regression.
Example: is the productivity of an economy correlated to employement rates?Remember when I told you to avoid calling your data “data”?
That’s because it’s bad practice to create R objects with the same name of functions.
R has a bunch of datasets stored in its memory, mostly for educational purposes, and functiondata()
is exactly what’s used to access those datasets.
data(longley) # data on economics and demographics
plot(GNP ~ Employed, data = longley)
# make it fancier:
par(mar=c(5,5,5,3)) # adjusts the margins
plot(GNP ~ Employed, data = longley,
xlab="% employed adults (I guess)",
ylab="Gross National Product (million $?)",
main="Money money money!"
)
# it looks like there is a correlation...
# ...And it looks linear, so it should obey the model:
# GNP = a + b * Employed
# how to find a and b?
# let R do it:
linmod1 <- lm(GNP ~ Employed, data = longley)
anova(linmod1)
summary(linmod1)
abline(linmod1, lty="solid")
R also gives us significance levels (SE, p-vals) and a bunch of other useful bits of information.
And it does so with TWO LINES OF CODE. Not too shabby!For an example showing how the Least Square Method works, visit https://stackoverflow.com/questions/50508425/home-brew-implementation-of-a-least-squares-method-in-r-showing-unexpected-behav?sfb=2