All models are wrong, but some are useful.George Box

Most links below take to R scripts that I wrote to illustrate how to perform various tasks (chiefly statistics) in R; other links lead to some of the many existing pages that illustrate topics and tasks better than I would.

You are free to use my original material on this website for non-lucrative purposes. If you are planning to make money out of it, get in touch with me first so that we can arrange terms and conditions. I am also available as an R-gun for hire for statistical consultancy and for teaching courses.

Enjoy!

 

UNIVARIATE STATISTICS

Univariate statistics refer to the set of tests and analyses used to analyse response variables in which every observation is expressed by a single value.

 

UNIVARIATE STATISTICS FOR NORMALLY DISTRIBUTED DATA

 

GENERALISED LINEAR MODELS (GLMs)

GLMs allow to perform univariate statistics for NON-NORMALLY  distributed data and they represent an alternative to performing general linear models on transformed data. such as in the following examples:

  • Comparing two groups (count data) – the code in this example can easily be adapted to more complicated situations, such as comparing more than two groups or testing the effect of multiple categorical variables on a response variable in “count” format.
  • Another example with count data – here we look at a regression-like situation with non-Normal data.
  • Binary data such as survival data or presence/absence: see this worked example by Gavin Simpson, where he also shows how to compute confidence intervals correctly.

Further useful resources:

  • On the difference between using GLMs or using LMs on transformed data, see here and here.
  • On how to compute Confidence Intervals, see here and here.
  • On how to calculate the SE of a quantity on the original scale knowing its SE on the transformed scale, see here (delta method).

 

MIXED EFFECT MODELS (MEMs)

MEMs are useful to deal with unbalanced study designs and autocorrelated (non-independent) data. They estimate a statistical model’s fixed effects while accounting for the model’s random effects. As a practical and brutal definition, a model’s fixed effects are the explanatory variables the effect of which we are interested to quantify explicitly; a model’s random effects are variables that are not the main focus of our study but that we want to account for because we expect them to explain some of the residual variability (more on fixed and random effects: here, here).

  • An example by Ignasi Bartolomeus (the issue with his example is that the random effect has only three levels. While this is helpful for teaching purposes, they recommend the random effect to have at least 5-6 levels for it to give reliable estimates).
  • An excellent book for understanding MEMs and implementing them in R is “A Beginner’s Guide to GLM and GLMM with R: A Frequentist and Bayesian Perspective for Ecologists” by Alain F. Zuur, Joseph M. Hilbe, and Elena N. Leno.

 

PHYLOGENETIC GENERALIZED LEAST SQUARES (PGLS)

Mixed effect models can deal with data non-independence. A special case of data non-independence is represented by the phylogenetic relationship among species. Let’s say that we want to study the correlation between dietary habits and mean body weight in mammals. The body weight of different species may be affected by their dietary habits, but their phylogenetic relationship may also matter – closely related species may have similar mean body weight just because of their common evolutionary history. To study the correlation between dietary habits and mean body weight in mammals, one has to account for phylogenetic signal. One possibility is to use MEMs with nested random effects, such as:

MEM.model <- lmer(data = mammals, body.weigth ~ dietary.pref
+ (1 + dietary.pref |family/genus/species))

A more specific alternative is to use phylogenetic generalized least squares (PGLS), that allow to inform the model about the autocorrelation between taxonomic units using existing phylogenetic trees. Here is an example where they look at the relationship between wing length and tarsus length among Geospiza finch species.

 

GENERAL ADDITIVE MODELS (GAM)

I used to be dismissive about GAMs, seeing them as a glorified smoothing function. In a way they are, but they can accomplish a lot: they can model and compare strongly non-linear and asymmetric patterns, they can decompose the relative contribution of continuous and categorical predictors to the observed trends, etc. There are even ways to include random effects and to account for autocorrelation in GAMs. Extremely useful resources for getting started with GAMs are here, here, and in Gavin Simpson’s answer to this question I asked on CrossValidated. For further detail, see this paper.

 

MULTIVARIATE STATISTICS

Multivariate statistics is used to analyse response variables in which every observation consists in a set of values.