afex in missing data (when testing main effects in presence of interactions in factorial designs)

Mon, Jun 5, 2017 3:15 AM
Hi Thomas,

As afex is mentioned, let me weigh in with my thoughts (though I am as 
always happy for corrections).

First of all, afex implements basically the procedure described in the 
document by Roger Levy, when using the same type of tests (i.e., 
likelihood ratio tests). Using the second example in the document we get:

require(mvtnorm)
require(afex)
set.seed(1)
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
                    Y=factor(c("y1","y2","y3")),
                    subj=factor(paste("S",1:M)),
                    item=factor(paste("I",1:M)))
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
                        b.I[cbind(item,XY)] + rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
            dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
anova(m0,m1)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
#    Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28  -1136.6
# m1 49 -1038.7 -805.35 568.33  -1136.7 0.0999      1      0.752

(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
               method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
#
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
#   Effect df     Chisq p.value
# 1      X  1      0.10     .75
# 2      Y  2 66.06 ***  <.0001
# 3    X:Y  2 87.90 ***  <.0001
# ---
# Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?+? 0.1 ? ? 1

As can be seen, the test of x is identical with a chi-square value of 
0.0999 (rounded to 0.10) and a p-value of 0.75.

We can also see that the intercept correspond exactly to the grand mean:
fixef(mm1$full_model)[1]
# (Intercept)
#    3.006103

mean(dat$Response)
# [1] 3.006103

Your question is now what happens for unbalanced data. Type III tests 
(as implemented here and in the document) assume the imbalance is random 
and not structural (i.e., data is missing completely at random). This is 
achieved by weighting the cells equally and not the data points. In 
other words, we simply do the analysis assuming each cell (i.e., 
combination of factors) had the same size. As a consequence, the grand 
mean does not correspond to the grand mean of the data, but to the 
unweighted mean of the cells. (Note that this only holds approximately 
for mixed models, as the random effects also play a role here. For 
standard ANOVAs this is exactly true.)

An example can show this. We first remove one data point and then fit 
the data again (with a very reduced model as we are only interested in 
the intercept estimate here):

dat2 <- dat[-1,]

mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
fixef(mm2)[1]
# (Intercept)
#    3.006236

mean(dat2$Response)
# [1] 3.008559

mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254

As can be seen, the intercept corresponds approximately to the 
unweighted grand mean of all cells, but not to the grand mean of the data.

As said above, the small difference between the intercept and the grand 
mean are a consequence of the random effects. Change the random effects 
structure to the one above and you will see even more differences. 
Furthermore, for standard ANOVA Type III tests exactly correspond to the 
unweighted grand mean

What this tells us is that Type III tests are only appropriate if the 
missing is not structural but random. In other words, the treatment 
(i.e., condition levels) should not be the cause for the missing data 
for Type III tests. Type III test correct for missing data by simply 
treating the data as having no missing data (i.e., all cells are weighed 
equally).

If the missing were structural (i.e., a consequence of the factor 
levels) this approach is of course not appropriate. Then we would want 
to use something like Type II tests or what you describe in which the 
group sizes are taken into account and the intercept correspond to the 
grand mean. For example, for observational data where group sizes are 
very often informative Type II tests seem more appropriate.

I expect this difference in how Type II and Type III tests deal with 
imbalance is also one of the reasons why those statisticans that work a 
lot with observational data prefer Type II over Type III tests whereas 
those that work with experimental data (where imbalance is usually small 
and random) prefer Type III tests over Type II test.

Hope that helps,
Henrik

PS: It always helps to include some example data.

Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
afex in missing data (when testing main effects in presence of interactions in factorial designs)

Thread (5 messages)