MANCOVA

Learn about about MANCOVA using R

Last updated on Oct 2, 2019 7 min read Statistics, Data Wrangling

MANCOVA

If you’ve been following along my posts, you can probably guess what a MANCOVA is good for. Basically, take the concepts of MANOVA and ANCOVA and suh whoo, you got a MANCOVCA. In case you haven’t been following along, I’ll explain what a MANCOVA is good for. For this example, our question will be does chick weight from time point 2 to time point 4 differ depending on diet removing variance associated with weight when the chick was born.

It’s simliar to the MANOVA in that it can model differences between three or more variables and it’s similar to an ANCOVA in that it can remove the variance associated with a confounding variable.

The adjustment equation looks like:

$$\overline{y}_{j(adj)} = \overline{y} - b_w(\overline{X}_j - \overline{X})$$

MANOVA equation:

A quick reminder of the equations invovled in a MANOVA, taken from my MANOVA blog post.

To test this hypothesis we can use Wilk’s test statistic. $$\Lambda = \frac{|E|}{|E + H|}$$

We reject $H_0$ if $\Lambda \underline{<} \Lambda_{\alpha, p, \upsilon H, \upsilon E}$ where p is the number of variables, $\upsilon H$ is the degrees of freedom for the hypothesis, and $\upsilon E$ is the degrees of freedom for error. This is analogous to the F test in the ANOVA. R nicely calculates the analgous F-statistic for us so I’m not going to go into how that’s calculated. To calculate the between sample variance matrix: ($H$) is $$H = \sum_{i=1}^k\frac{1}{n}y_{i}y\prime_i - \frac{1}{kn}yy\prime$$. To calculate within sample variance matrix ($E$) is $$E = \sum_{ij}y_{ij}y\prime_{ij} - \sum_i \frac{1}{n}y_iy\prime_i$$.”

Now we can calculate $\Lambda$ and get the F stastistic.”

In R

We’ll again be using “ChickWeight” from the MASS library for the dataset. We’ll need to do some data wrangling in order to get it ready for the model.

Remember, our question is does chick weight from time point 2 to time point 4 differ depending on diet removing variance associated with weight when the chick was born.

# install.packages(MASS)
# install.packages(dplyr) - you should really have this installed by now if you don't...
library(MASS)
library(dplyr)
data("ChickWeight")

summary(ChickWeight)

##      weight           Time           Chick     Diet   
##  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
##  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
##  Median :103.0   Median :10.00   20     : 12   3:120  
##  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
##  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
##  Max.   :373.0   Max.   :21.00   19     : 12          
##                                  (Other):506

#View(ChickWeight)

I’m going to spell out my logic for the data transformation before I do it. Currently our data is in long format and we need to transform it to wide format. If it doesn’t make sense why we have to change the format of the data, think about how we input the data into our model’s equations. Would long format or wide format fit? To do this we have four essential goals:

Diet will have to have redundancies removed from it to have one observation for each chick.
Weight at time 0 will be its own column as the covariate.
Weight at time 2 will be its own column as a dependent variable.
Weight at time 4 will be its own column as a dependent variable.

To do this we can use the reshape function and then remove the columns we don’t want using the select function.

#Reshape data to wide format and keep key variables
chick = reshape(ChickWeight, idvar = "Chick", timevar =  "Time", direction = "wide") %>% 
  select("weight.0", "weight.2", "weight.4", "Diet.0")

#Ensure Diet is a factor for the model
chick$Diet.0 = factor(chick$Diet.0, label = c("1", "2", "3", "4"))

#combine the two dependent variables so we can run them as one model instead of two (changes df)
outcome = cbind(chick$weight.2, chick$weight.4)

Assumptions

Before running our model we must make sure we meet our assumptions. It needs to meet all the assumptions of a ANOVA and ANCOVA, which is a lot. I’m not testing them here, for conciseness. In practice, always test your assumptions or you could end up having a botched model and phony results!

#MANCOVA model
m.mancova = manova(outcome ~ Diet.0 * weight.0, data = chick)
summary(m.mancova, test = "Wilks", type = "III")

##                 Df   Wilks approx F num Df den Df    Pr(>F)    
## Diet.0           3 0.44886   6.5682      6     80 1.141e-05 ***
## weight.0         1 0.94504   1.1630      2     40   0.32288    
## Diet.0:weight.0  3 0.72993   2.2730      6     80   0.04464 *  
## Residuals       41                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a significant interaction between the covariate and the factor, which breaks the parallel lines assumption. There are few options here, and in this case I’m going to leave the covariate in the model.

This interaction could be due to differences between the different group birth weights. I can check if there are differnces between the group birth weights to see if this assumption holds true. If there are differences, it would indicate these differences are driving this interaction. If there aren’t, it would indicate there is a true interaction between birth weight and diet.

aov(weight.0 ~ Diet.0, data = chick)

## Call:
##    aov(formula = weight.0 ~ Diet.0, data = chick)
## 
## Terms:
##                 Diet.0 Residuals
## Sum of Squares    4.32     58.50
## Deg. of Freedom      3        46
## 
## Residual standard error: 1.127714
## Estimated effects may be unbalanced

Since there is no significant difference, no further testing is warranted and we conclude birth weights between the chicks are equal. Which suggests there is a true interaction between birth weight and diet in the MANCOVA model. For the purpose of this tutorial, I’m going to ignore this interaction. In a future post, I will discuss how to deal with those pesky interactions in your model. Anyways, back to the model.

If you want to see the unadjusted dependent variable means, uncomment the “describeBy” command and the run the code below (rmarkdown was being fussy with formatting so I’m not including the tables).

#install.packages("psych")
library(psych)
#describeBy(chick, chick$Diet.0)

Adjusted Means for Weight 2

model1 = aov(weight.2 ~ weight.0 + Diet.0, data = chick)
summary(model1, type = "III")

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## weight.0     1   28.5   28.54   2.889 0.096106 .  
## Diet.0       3  193.5   64.50   6.529 0.000923 ***
## Residuals   45  444.5    9.88                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#install.packages("effects")
library(effects)

## Loading required package: carData

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

adjmean1 = effect("Diet.0", model1, se=TRUE)
summary(adjmean1)

## 
##  Diet.0 effect
## Diet.0
##        1        2        3        4 
## 46.89547 49.77538 50.67111 51.86256 
## 
##  Lower 95 Percent Confidence Limits
## Diet.0
##        1        2        3        4 
## 45.45224 47.75148 48.65772 49.86009 
## 
##  Upper 95 Percent Confidence Limits
## Diet.0
##        1        2        3        4 
## 48.33870 51.79929 52.68450 53.86503

adjmean1$se

## [1] 0.7165598 1.0048676 0.9996445 0.9942241

Adjusted Means for Weight 4

model2 = aov(weight.4 ~ weight.0 + Diet.0, data = chick)
summary(model2, type = "III")

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## weight.0     1    0.6    0.57   0.055    0.816    
## Diet.0       3  507.8  169.25  16.134 3.24e-07 ***
## Residuals   44  461.6   10.49                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness

adjmean2 = effect("Diet.0", model2, se=TRUE)
summary(adjmean2)

## 
##  Diet.0 effect
## Diet.0
##        1        2        3        4 
## 56.20616 60.05350 62.39045 64.56434 
## 
##  Lower 95 Percent Confidence Limits
## Diet.0
##        1        2        3        4 
## 54.66056 57.95771 60.30835 62.49807 
## 
##  Upper 95 Percent Confidence Limits
## Diet.0
##        1        2        3        4 
## 57.75177 62.14929 64.47255 66.63061

adjmean2$se

## [1] 0.7669118 1.0399060 1.0331110 1.0252561

Together, these ANOVA tables suggest both dependent variables contribute to the overall MANCOVA model’s significance. In the overall model, birth weight had no sifnificant relation with weigh at time point 2 or 4, which is also seen in both ANOVA models. There’s a lot contrasts you could run, which I won’t do here because they are the same as what we did in the MANOVA and ANOVA posts.

In the next post, I’ll discuss how to deal with interactions in multivaraite models.

References

https://us.sagepub.com/sites/default/files/upm-binaries/70365_Schumacker_Chapter_5.pdf

https://www.statisticssolutions.com/multivariate-analysis-of-covariance-mancova/

https://www.theanalysisfactor.com/assumptions-of-ancova/

http://www.flutterbys.com.au/stats/tut/tut7.5a.html

https://en.wikipedia.org/wiki/Multivariate_analysis_of_covariance

Statistics MANCOVA Basics Data Wrangling

Mohan Gupta

Postdoctoral Scholar

My research interests include the what are the best ways to learn, why those are the best ways, and can I build computational models to predict what people will learn in both motor and declarative learning .

MANCOVA

MANCOVA

MANOVA equation:

In R

Assumptions

Adjusted Means for Weight 2

Adjusted Means for Weight 4

Mohan Gupta

Postdoctoral Scholar

Related