Learn about about ANCOVAs using R

What if you didn’t run the most perfect tightly controlled study and you’re worried there might be some covariate influencing your dataset. We can do that by running a quasi experiment using an Analysis of Covariance (ANCOVA) model. While it doesn’t perfectly control for the confound, it’s a useful tool to mitigate the noise it introduces into your data.

In this example I’ll be using the dataset mtcars included in base R.

##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Let’s trim down the number of variables based on the question we’re going to ask. Let’s say we’re interested if cars with differing amount of gears have significantly different weights, removing the variance associated with miles per gallon (mpg).

# install.packages("dplyr")
mtcars = mtcars %>% 
  dplyr::select("gear", "mpg", "wt")
mtcars$gear = factor(mtcars$gear, label = c("3", "4", "5"))

It’s necessary to make gear as a factor for the levene’s test and multiple comparisons to work later.


ANCOVAs are basically the love child of regressions and ANOVAs. They’re actually a type of multiple regression. We’re going to add a regression variable to our ANOVA. As a reminder a OLS regression looks like: $$Y_i = \beta_0 + \beta_1(X_i) + \epsilon_i$$.

And a quick refresher on the ANOVA matrix notation: $$Y_ij = \mu + \alpha_i + \epsilon_ij$$

When we combine them together we can get something that looks like: $$Y_{ij} = \mu + \alpha_i + \beta(x_{ij} - \overline{x}) + \epsilon_{ij} $$ Note this is the centered version of the equation.

  1. $\mu$ = is a constant for all observations
  2. $\alpha$ = treatment factor e.g. group
  3. $\beta$ = beta estimate for the covariate
  4. $\epsilon$ = error term


  1. Parallel slopes e.g. there are no interactions
  2. Homogeneity
  3. Covariate is unrelated to independent variable
  4. Homoscedasticity of slopes


To test for homogenity we can use a levene’s test. I might do a future blog post on homogeneity and levene’s test in the future.

leveneTest(wt ~ gear, center = mean, data = mtcars)
## Levene's Test for Homogeneity of Variance (center = mean)
##       Df F value Pr(>F)
## group  2  0.1444 0.8661
##       29

Because the test was not significant we can conclude the homogeneity assumption is true.

Interaction Model

We want lines to be parallel to each other. This suggests there isn’t an interaction between confound and the treatment.

What is an interaction? Hmmm good question. It happens it makes our story harder to explain. Basically it means when two variables are looked at to see how they affect a third variable, the first two variables affect the third variable in more than just an additive way. Think of the good ol’ days when your caffeine came straight in your four loko. The interaction between caffeine and alcohol is an interaction that is more than just the addition of it’s parts. Some might say it’s gold.

Back to the problem at hand. Let’s test if we have parallel lines. You must enter your covariate first. In this case mpg.

fit2 <- aov(wt~mpg*gear,data = mtcars)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## mpg          1 22.343  22.343 109.975 7.81e-11 ***
## gear         2  1.117   0.558   2.748   0.0826 .  
## mpg:gear     2  0.937   0.468   2.305   0.1198    
## Residuals   26  5.282   0.203                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Weight is the y variable and we’re looking checking if there is an interaction (non-parallel lines) between gear and mpg in relation to weight Since mpg:wt is insignificant, we can conclude the parallel lines assumption is true. Although, I’m personally wary of any thing that is close to what we consider significance because it’s an arbitrary accepted practice in science. Regardless, for the purpose of this post, we’ll keep with the assumption.

Additive Model

This is the model we can use if the parallel lines assumption is not broken, which in our case it is not.

fit1 <- aov(wt~mpg+gear,data = mtcars)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## mpg          1 22.343  22.343 100.598 9.01e-11 ***
## gear         2  1.117   0.558   2.514   0.0991 .  
## Residuals   28  6.219   0.222                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note on the code here - you can also type aov(lm(wt~am+gear,data = mtcars)), which would also be correct. For conciseness and clarity, I code it as the example above. Also, I’m using an uncentered ANCOVA in this example which has a slightly different equation than the one I have listed above. It typically doesn’t change the conclusions you would make with the model.

Significance test

How is the ANCOVA model determining significance? It uses an F test to get there, that looks pretty similar to the ANOVA with some additions.

$$F = \frac{(\frac{SSE_R - SSE_F}{df_R - df_F})}{(\frac{SSE_F}{df_F})}$$

  1. $SSE_R$ is the sum of squares error for the reduced model.
  2. $SSE_F$ is the sum of squares error for the full model.
  3. $df_R$ is the error degrees of freedom for reduced model.
  4. $df_F$ is the error degrees of freedom for the full model.

Multiple Comparisons

To test multiple comparisons we’re going to need the multcomp library. The ghlt function allows to test our general linear hypothesis and the mcp function specifies the linear hypothesis.

posthoc <- glht(fit1, linfct = mcp(gear = "Tukey"))
##   Simultaneous Tests for General Linear Hypotheses
## Multiple Comparisons of Means: Tukey Contrasts
## Fit: aov(formula = wt ~ mpg + gear, data = mtcars)
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)  
## 4 - 3 == 0  -0.1948     0.2405  -0.810   0.6998  
## 5 - 3 == 0  -0.5834     0.2624  -2.224   0.0846 .
## 5 - 4 == 0  -0.3886     0.2576  -1.509   0.3019  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

To see the adjusted means by groups we’re going to use the effects library.

effect("gear", fit1)
##  gear effect
## gear
##        3        4        5 
## 3.381463 3.186661 2.798025

We can conclude, after adjusting for mpg, there are no significant differences in weight for cars with different gears.

Appropriate Use of ANCOVAs

You want to be careful when using ANOVAs. They are not fit to answer some problems. For example, if we chose to ask the question, does mpg differ among cars with different amount of gears, removing the variance associated with weight? This is an honest question to ask, but mpg is inherently associated with the weight of a car. Cars that weigh more are going to get worse mpg. Some cars might have more fuel efficient engines but weight is closely associated with mpg and removing its variance would most likely remove significant differences from your outcome.

In the next post, I’ll discuss MANCOVAs.


Mohan Gupta
Psychology PhD Student

My research interests include the what are the best ways to learn, why those are the best ways, and can I build computational models to predict what people will learn in both motor and declarative learning .