afex in missing data (when testing main effects in presence of interactions in factorial designs)
Hi Henrik, hi Thomas, There is one small additional footnote to add to the Type-II/Type-III comparison: Type-II tests respect the principle of marginality, while Type-III tests do not. This is discussed somewhat in John Fox's Applied Regression textbook and is one of the main points in?Venables' Exegeses on Linear Models. https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf In other words: Type-III tests allow you to test main effects as if the interaction were not there, while Type-II tests incorporate both the interaction and main effect into the test for the main effect. When the data are balanced, this are the same. (If I have misunderstood this point or formulated it infelicitously, I am happy to be corrected!)? Best, Phillip
On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
Hi Thomas,
As afex is mentioned, let me weigh in with my thoughts (though I am
as?
always happy for corrections).
First of all, afex implements basically the procedure described in
the?
document by Roger Levy, when using the same type of tests (i.e.,?
likelihood ratio tests). Using the second example in the document we
get:
require(mvtnorm)
require(afex)
set.seed(1)
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
????????????????????Y=factor(c("y1","y2","y3")),
????????????????????subj=factor(paste("S",1:M)),
????????????????????item=factor(paste("I",1:M)))
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
????????????????????????b.I[cbind(item,XY)] +
rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
????????????dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
anova(m0,m1)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
#????Df?????AIC?????BIC logLik deviance??Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28??-1136.6
# m1 49 -1038.7 -805.35 568.33??-1136.7 0.0999??????1??????0.752
(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
???????????????method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
#
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
#???Effect df?????Chisq p.value
# 1??????X??1??????0.10?????.75
# 2??????Y??2 66.06 ***??<.0001
# 3????X:Y??2 87.90 ***??<.0001
# ---
# Signif. codes:??0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?+? 0.1 ? ? 1
As can be seen, the test of x is identical with a chi-square value
of?
0.0999 (rounded to 0.10) and a p-value of 0.75.
We can also see that the intercept correspond exactly to the grand
mean:
fixef(mm1$full_model)[1]
# (Intercept)
#????3.006103
mean(dat$Response)
# [1] 3.006103
Your question is now what happens for unbalanced data. Type III
tests?
(as implemented here and in the document) assume the imbalance is
random?
and not structural (i.e., data is missing completely at random). This
is?
achieved by weighting the cells equally and not the data points. In?
other words, we simply do the analysis assuming each cell (i.e.,?
combination of factors) had the same size. As a consequence, the
grand?
mean does not correspond to the grand mean of the data, but to the?
unweighted mean of the cells. (Note that this only holds
approximately?
for mixed models, as the random effects also play a role here. For?
standard ANOVAs this is exactly true.)
An example can show this. We first remove one data point and then
fit?
the data again (with a very reduced model as we are only interested
in?
the intercept estimate here):
dat2 <- dat[-1,]
mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
fixef(mm2)[1]
# (Intercept)
#????3.006236
mean(dat2$Response)
# [1] 3.008559
mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254
As can be seen, the intercept corresponds approximately to the?
unweighted grand mean of all cells, but not to the grand mean of the
data.
As said above, the small difference between the intercept and the
grand?
mean are a consequence of the random effects. Change the random
effects?
structure to the one above and you will see even more differences.?
Furthermore, for standard ANOVA Type III tests exactly correspond to
the?
unweighted grand mean
What this tells us is that Type III tests are only appropriate if
the?
missing is not structural but random. In other words, the treatment?
(i.e., condition levels) should not be the cause for the missing
data?
for Type III tests. Type III test correct for missing data by simply?
treating the data as having no missing data (i.e., all cells are
weighed?
equally).
If the missing were structural (i.e., a consequence of the factor?
levels) this approach is of course not appropriate. Then we would
want?
to use something like Type II tests or what you describe in which
the?
group sizes are taken into account and the intercept correspond to
the?
grand mean. For example, for observational data where group sizes
are?
very often informative Type II tests seem more appropriate.
I expect this difference in how Type II and Type III tests deal with?
imbalance is also one of the reasons why those statisticans that work
a?
lot with observational data prefer Type II over Type III tests
whereas?
those that work with experimental data (where imbalance is usually
small?
and random) prefer Type III tests over Type II test.
Hope that helps,
Henrik
PS: It always helps to include some example data.
Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
Hi everyone, I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form y~1+x1+x2+x1:x2+(by-subject random effects), where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects. I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test". The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging. My Qs: Given the above implementation of the model,... 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)? 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2? 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options? Thanks for the help, Thomas ???[1]: https://arxiv.org/pdf/1405.2094.pdf "here" [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models