Dealing with Interactions

Learn about about Interactions using R

What is an interaction? What’s first discuss what it is not. It has nothing to do with the dependent variable (Y). It never involves only one independent variable, it must involve two or more. An interaction can be defined as one independent variable changing the effect of another independent variable on the dependent variable.

If you haven’t read my post on MANCOVAs, please do so before continuing so the code will make sense. This post will use the example problem from there. I’ll quickly rehash it. From MANCOVA post“our question is does chick weight from time point 2 to time point 4 differ depending on diet removing variance associated with weight when the chick was born.”

The following code is from the MANCOVA post.

# install.packages(MASS)
# install.packages(dplyr) - you should really have this installed by now if you don't...
library(MASS)
library(dplyr)
data("ChickWeight")

summary(ChickWeight)
##      weight           Time           Chick     Diet   
##  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
##  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
##  Median :103.0   Median :10.00   20     : 12   3:120  
##  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
##  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
##  Max.   :373.0   Max.   :21.00   19     : 12          
##                                  (Other):506
#View(ChickWeight)
#Reshape data to wide format and keep key variables
chick = reshape(ChickWeight, idvar = "Chick", timevar =  "Time", direction = "wide") %>% 
  select("weight.0", "weight.2", "weight.4", "Diet.0")

#Ensure Diet is a factor for the model
chick$Diet.0 = factor(chick$Diet.0, label = c("1", "2", "3", "4"))

#combine the two dependent variables so we can run them as one model instead of two (changes df)
outcome = cbind(chick$weight.2, chick$weight.4)

Now it’s time to look at the MANCOVA model.

#MANCOVA model
m.mancova = manova(outcome ~ Diet.0 * weight.0, data = chick)
summary(m.mancova, test = "Wilks", type = "III")
##                 Df   Wilks approx F num Df den Df    Pr(>F)    
## Diet.0           3 0.44886   6.5682      6     80 1.141e-05 ***
## weight.0         1 0.94504   1.1630      2     40   0.32288    
## Diet.0:weight.0  3 0.72993   2.2730      6     80   0.04464 *  
## Residuals       41                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Steps to Interaction Interpretation

We see there is a significant interaction between diet and starting weight. We can interepret this that either diet or birth weight coefficient depends on the other one. In this case, the relationship between diet and birth weight significantly differ from birth weight to birth weight.

The unfortunate thing is, there are no ways to plot interactions in MANOVA models (to the best of my knowledge).

ANCOVA Interaction Example

So we’re going to change the problem from an MANCOVA problem to a ANCOVA problem. I wanted to do this with the same dataset but the interaction doesn’t hold when I cut down the number of groups in diet category to two.

Remember that time I said you shouldn’t use weight at a covariate with miles per gallon (mpg) as a predictor?. Me neither. It’s quite hard to find example datasets and models that work for the example I’m trying to do sometimes, so sometimes I do have to fudge a little, full disclosure. We’ll be using mtcars dataset and our new question is how do cylinders relate to mpg removing the variance related to weight.

data("mtcars")
#lm has to be used here instead of aov for plotting to work
mod.ancova = lm(mpg ~ wt*cyl,data = mtcars)
summary(mod.ancova)
## 
## Call:
## lm(formula = mpg ~ wt * cyl, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2288 -1.3495 -0.5042  1.4647  5.2344 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  54.3068     6.1275   8.863 1.29e-09 ***
## wt           -8.6556     2.3201  -3.731 0.000861 ***
## cyl          -3.8032     1.0050  -3.784 0.000747 ***
## wt:cyl        0.8084     0.3273   2.470 0.019882 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.368 on 28 degrees of freedom
## Multiple R-squared:  0.8606, Adjusted R-squared:  0.8457 
## F-statistic: 57.62 on 3 and 28 DF,  p-value: 4.231e-12

Look at that, we have an interaction. We can interpret by saying weight or cylinders coefficient depends on the other one.

#install.packages("interplot")
library(interplot)
interplot(m = mod.ancova, var1 = "cyl", var2 = "wt") +
   labs(x="Weight (thousands lbs)", y="Estimated Coefficient for # of Cylinders", title="Estimated Coefficient of Cylinders on MPG by Weight")

From this plot we can see a clear relation between increasing weight and number of cylinders.

interplot(m = mod.ancova, var1 = "wt", var2 = "cyl") +
   labs(x="# of Cylinders", y="Estimated Coefficient for Weight", title="Estimated Coefficient of Weight on MPG by Cylinders")

From this plot we can see a clear relation between increasing cylinders and weight.

From these two graphs, we can say there is a two-way interaction, both variables exert influence over each other. A one-way interaction would indicate a unidirection relationship e.g. weight only affecting cylinders and cylinders not affecting weight.

That’s the basics to understanding an interaction in your model, plot your way to success!

In the next post, I’ll discuss chi-sqaure


References

http://www.stat.cmu.edu/~hseltman/309/Book/chapter10.pdf

http://www.rpubs.com/oscarhsnurene/229427

Avatar
Mohan Gupta
Psychology PhD Student

My research interests include the what are the best ways to learn, why those are the best ways, and can I build computational models to predict what people will learn in both motor and declarative learning .

Related