# MANOVA

ANOVAs are handy, however, are limited by the number of variables the model can handle. I.e. you can only test two variables across groups. Multiple Analysis of Variance (MANOVA) models can handle this problem.

## MANOVA

MANOVA is an extension of the ANOVA so it’ll be helpful for you if you’ve familiarized yourself with it. It will also be helpful if you’ve familiarized yourself with matrix maths. Because we’re dealing with a more complex model it’s helpful to use matrices to solve it.

Let’s imagine for each observation e.g. each participant, is modeled as vectors where $$y_{ij} = \mu + \epsilon_{ij}$$ where $$i = 1, 2, … , k;$$ is the variable, $$j = 1, 2, …, n.$$ is the $\mu_i$ is the group mean and $\epsilon_{ij}$ is the error for that variable and observation. As in the ANOVA, the null hypothesis in a MANOVA is still $H0: \mu_1 = \mu_2 = \mu_3$ and the alternate hypothesis is $H0: \mu_1 \neq \mu_2 \neq \mu_3$.

### Wilks’ Test Statistic

$$\Lambda = \frac{|E|}{|E + H|}$$

We reject $H_0$ if $\Lambda \underline{<} \Lambda_{\alpha, p, \upsilon H, \upsilon E}$ where p is the number of variables, $\upsilon H$ is the degrees of freedom for the hypothesis, and $\upsilon E$ is the degrees of freedom for error. This is analogous to the F test in the ANOVA. R nicely calculates the analgous F-statistic for us so I’m not going to go into how that’s calculated.

To calculate the between sample variance matrix: ($H$) is $$H = \sum_{i=1}^k\frac{1}{n}y_{i}y\prime_i - \frac{1}{kn}yy\prime$$. To calculate within sample variance matrix ($E$) is $$E = \sum_{ij}y_{ij}y\prime_{ij} - \sum_i \frac{1}{n}y_iy\prime_i$$.

Now we can calculate $\Lambda$ and get the F stastistic.

In R:

#install.packages("car")
library(car)

## Loading required package: carData

data(iris)
man.model = manova(cbind(Sepal.Length, Sepal.Width, Petal.Length) ~ Species, data = iris)
res.man=summary(manova(cbind(Sepal.Length, Sepal.Width, Petal.Length) ~ Species, data = iris), test = "Wilks")
res.man

##            Df    Wilks approx F num Df den Df    Pr(>F)
## Species     2 0.031546    223.8      6    290 < 2.2e-16 ***
## Residuals 147
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Great we rejected the null for the overall model, what about each variable? To this we have to do post-hoc testing to compare each variable within each group. This is the same as what we did in the ANOVA. I’ll use Tukey’s HSD for the post-hocs.

In R:

#devtools::install_github("easystats/report")
#install.packages("ggplot2)
#install.packages("dplyr")
library(report)
library(ggplot2)
library(dplyr)

##
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
##
##     recode

## The following object is masked from 'package:kableExtra':
##
##     group_rows

## The following objects are masked from 'package:stats':
##
##     filter, lag

## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union


## Contrasts

1. Sepal Length Here, we’ll test differences in sepal length between the three species.

fit.s.length <- aov(Sepal.Length ~ Species, data=iris)
summary(fit.s.length)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Species       2  63.21  31.606   119.3 <2e-16 ***
## Residuals   147  38.96   0.265
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

fit.s.length %>%
report() %>%
to_table()

##   Parameter Sum_Squares DoF Mean_Square        F            p
## 1   Species    63.21213   2  31.6060667 119.2645 1.669669e-31
## 2 Residuals    38.95620 147   0.2650082       NA           NA
##   Omega_Sq_partial
## 1        0.6119308
## 2               NA

TukeyHSD(fit.s.length)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
##
## Fit: aov(formula = Sepal.Length ~ Species, data = iris)
##
## $Species ## diff lwr upr p adj ## versicolor-setosa 0.930 0.6862273 1.1737727 0 ## virginica-setosa 1.582 1.3382273 1.8257727 0 ## virginica-versicolor 0.652 0.4082273 0.8957727 0  plot(TukeyHSD(fit.s.length)) 2. Sepal Width Here, we’ll test differences in sepal width between the three species. fit.s.width <- aov(Sepal.Width ~ Species, data=iris) fit.s.width %>% report() %>% to_table()  ## Parameter Sum_Squares DoF Mean_Square F p ## 1 Species 11.34493 2 5.6724667 49.16004 4.492017e-17 ## 2 Residuals 16.96200 147 0.1153878 NA NA ## Omega_Sq_partial ## 1 0.3910362 ## 2 NA  TukeyHSD(fit.s.width)  ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = Sepal.Width ~ Species, data = iris) ## ##$Species
##                        diff         lwr        upr     p adj
## versicolor-setosa    -0.658 -0.81885528 -0.4971447 0.0000000
## virginica-setosa     -0.454 -0.61485528 -0.2931447 0.0000000
## virginica-versicolor  0.204  0.04314472  0.3648553 0.0087802

plot(TukeyHSD(fit.s.width)) 3. Petal.Length Here, we’ll test differences in petal length between the three species.

fit.p.length <- aov(Petal.Length ~ Species, data=iris)
summary(fit.p.length)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Species       2  437.1  218.55    1180 <2e-16 ***
## Residuals   147   27.2    0.19
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

fit.p.length %>%
report() %>%
to_table()

##   Parameter Sum_Squares DoF Mean_Square        F            p
## 1   Species    437.1028   2 218.5514000 1180.161 2.856777e-91
## 2 Residuals     27.2226 147   0.1851878       NA           NA
##   Omega_Sq_partial
## 1        0.9401991
## 2               NA

TukeyHSD(fit.p.length)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
##
## Fit: aov(formula = Petal.Length ~ Species, data = iris)
##
## \$Species
##                       diff     lwr     upr p adj
## versicolor-setosa    2.798 2.59422 3.00178     0
## virginica-setosa     4.090 3.88622 4.29378     0
## virginica-versicolor 1.292 1.08822 1.49578     0

plot(TukeyHSD(fit.p.length)) As expected all of the variables across all groups are siginificant.

A slight aside in confusing terminology that came up while I was doing this. Contrasts and post-hocs are different. Contrasts are when you have a planned test even if, in this case a MANOVA, was not significant. Although, typically you accect the results of the MANOVA regardless if your contrasts are significant. Post-hocs are only done when you don’t have an apriori hypothesis test between two variables and if the MANOVA was significant.

In the next post, I’ll discuss ANCOVAs.

https://www.ipen.br/biblioteca/slr/cel/0241 ##### Mohan Gupta
###### Psychology PhD Student

My research interests include the what are the best ways to learn, why those are the best ways, and can I build computational models to predict what people will learn.