Hi everyone, I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form y~1+x1+x2+x1:x2+(by-subject random effects), where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects. I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test". The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging. My Qs: Given the above implementation of the model,... 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)? 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2? 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options? Thanks for the help, Thomas [1]: https://arxiv.org/pdf/1405.2094.pdf "here"
afex in missing data (when testing main effects in presence of interactions in factorial designs)
5 messages · Henrik Singmann, Alday, Phillip, Thomas Deimel
Hi Thomas,
As afex is mentioned, let me weigh in with my thoughts (though I am as
always happy for corrections).
First of all, afex implements basically the procedure described in the
document by Roger Levy, when using the same type of tests (i.e.,
likelihood ratio tests). Using the second example in the document we get:
require(mvtnorm)
require(afex)
set.seed(1)
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
Y=factor(c("y1","y2","y3")),
subj=factor(paste("S",1:M)),
item=factor(paste("I",1:M)))
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
b.I[cbind(item,XY)] + rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
anova(m0,m1)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
# Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28 -1136.6
# m1 49 -1038.7 -805.35 568.33 -1136.7 0.0999 1 0.752
(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
#
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
# Effect df Chisq p.value
# 1 X 1 0.10 .75
# 2 Y 2 66.06 *** <.0001
# 3 X:Y 2 87.90 *** <.0001
# ---
# Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?+? 0.1 ? ? 1
As can be seen, the test of x is identical with a chi-square value of
0.0999 (rounded to 0.10) and a p-value of 0.75.
We can also see that the intercept correspond exactly to the grand mean:
fixef(mm1$full_model)[1]
# (Intercept)
# 3.006103
mean(dat$Response)
# [1] 3.006103
Your question is now what happens for unbalanced data. Type III tests
(as implemented here and in the document) assume the imbalance is random
and not structural (i.e., data is missing completely at random). This is
achieved by weighting the cells equally and not the data points. In
other words, we simply do the analysis assuming each cell (i.e.,
combination of factors) had the same size. As a consequence, the grand
mean does not correspond to the grand mean of the data, but to the
unweighted mean of the cells. (Note that this only holds approximately
for mixed models, as the random effects also play a role here. For
standard ANOVAs this is exactly true.)
An example can show this. We first remove one data point and then fit
the data again (with a very reduced model as we are only interested in
the intercept estimate here):
dat2 <- dat[-1,]
mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
fixef(mm2)[1]
# (Intercept)
# 3.006236
mean(dat2$Response)
# [1] 3.008559
mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254
As can be seen, the intercept corresponds approximately to the
unweighted grand mean of all cells, but not to the grand mean of the data.
As said above, the small difference between the intercept and the grand
mean are a consequence of the random effects. Change the random effects
structure to the one above and you will see even more differences.
Furthermore, for standard ANOVA Type III tests exactly correspond to the
unweighted grand mean
What this tells us is that Type III tests are only appropriate if the
missing is not structural but random. In other words, the treatment
(i.e., condition levels) should not be the cause for the missing data
for Type III tests. Type III test correct for missing data by simply
treating the data as having no missing data (i.e., all cells are weighed
equally).
If the missing were structural (i.e., a consequence of the factor
levels) this approach is of course not appropriate. Then we would want
to use something like Type II tests or what you describe in which the
group sizes are taken into account and the intercept correspond to the
grand mean. For example, for observational data where group sizes are
very often informative Type II tests seem more appropriate.
I expect this difference in how Type II and Type III tests deal with
imbalance is also one of the reasons why those statisticans that work a
lot with observational data prefer Type II over Type III tests whereas
those that work with experimental data (where imbalance is usually small
and random) prefer Type III tests over Type II test.
Hope that helps,
Henrik
PS: It always helps to include some example data.
Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
Hi everyone, I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form y~1+x1+x2+x1:x2+(by-subject random effects), where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects. I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test". The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging. My Qs: Given the above implementation of the model,... 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)? 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2? 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options? Thanks for the help, Thomas [1]: https://arxiv.org/pdf/1405.2094.pdf "here" [[alternative HTML version deleted]]
1 day later
Hi Henrik, hi Thomas, There is one small additional footnote to add to the Type-II/Type-III comparison: Type-II tests respect the principle of marginality, while Type-III tests do not. This is discussed somewhat in John Fox's Applied Regression textbook and is one of the main points in?Venables' Exegeses on Linear Models. https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf In other words: Type-III tests allow you to test main effects as if the interaction were not there, while Type-II tests incorporate both the interaction and main effect into the test for the main effect. When the data are balanced, this are the same. (If I have misunderstood this point or formulated it infelicitously, I am happy to be corrected!)? Best, Phillip
On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
Hi Thomas,
As afex is mentioned, let me weigh in with my thoughts (though I am
as?
always happy for corrections).
First of all, afex implements basically the procedure described in
the?
document by Roger Levy, when using the same type of tests (i.e.,?
likelihood ratio tests). Using the second example in the document we
get:
require(mvtnorm)
require(afex)
set.seed(1)
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
????????????????????Y=factor(c("y1","y2","y3")),
????????????????????subj=factor(paste("S",1:M)),
????????????????????item=factor(paste("I",1:M)))
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
????????????????????????b.I[cbind(item,XY)] +
rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
????????????dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
anova(m0,m1)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
#????Df?????AIC?????BIC logLik deviance??Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28??-1136.6
# m1 49 -1038.7 -805.35 568.33??-1136.7 0.0999??????1??????0.752
(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
???????????????method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
#
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
#???Effect df?????Chisq p.value
# 1??????X??1??????0.10?????.75
# 2??????Y??2 66.06 ***??<.0001
# 3????X:Y??2 87.90 ***??<.0001
# ---
# Signif. codes:??0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?+? 0.1 ? ? 1
As can be seen, the test of x is identical with a chi-square value
of?
0.0999 (rounded to 0.10) and a p-value of 0.75.
We can also see that the intercept correspond exactly to the grand
mean:
fixef(mm1$full_model)[1]
# (Intercept)
#????3.006103
mean(dat$Response)
# [1] 3.006103
Your question is now what happens for unbalanced data. Type III
tests?
(as implemented here and in the document) assume the imbalance is
random?
and not structural (i.e., data is missing completely at random). This
is?
achieved by weighting the cells equally and not the data points. In?
other words, we simply do the analysis assuming each cell (i.e.,?
combination of factors) had the same size. As a consequence, the
grand?
mean does not correspond to the grand mean of the data, but to the?
unweighted mean of the cells. (Note that this only holds
approximately?
for mixed models, as the random effects also play a role here. For?
standard ANOVAs this is exactly true.)
An example can show this. We first remove one data point and then
fit?
the data again (with a very reduced model as we are only interested
in?
the intercept estimate here):
dat2 <- dat[-1,]
mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
fixef(mm2)[1]
# (Intercept)
#????3.006236
mean(dat2$Response)
# [1] 3.008559
mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254
As can be seen, the intercept corresponds approximately to the?
unweighted grand mean of all cells, but not to the grand mean of the
data.
As said above, the small difference between the intercept and the
grand?
mean are a consequence of the random effects. Change the random
effects?
structure to the one above and you will see even more differences.?
Furthermore, for standard ANOVA Type III tests exactly correspond to
the?
unweighted grand mean
What this tells us is that Type III tests are only appropriate if
the?
missing is not structural but random. In other words, the treatment?
(i.e., condition levels) should not be the cause for the missing
data?
for Type III tests. Type III test correct for missing data by simply?
treating the data as having no missing data (i.e., all cells are
weighed?
equally).
If the missing were structural (i.e., a consequence of the factor?
levels) this approach is of course not appropriate. Then we would
want?
to use something like Type II tests or what you describe in which
the?
group sizes are taken into account and the intercept correspond to
the?
grand mean. For example, for observational data where group sizes
are?
very often informative Type II tests seem more appropriate.
I expect this difference in how Type II and Type III tests deal with?
imbalance is also one of the reasons why those statisticans that work
a?
lot with observational data prefer Type II over Type III tests
whereas?
those that work with experimental data (where imbalance is usually
small?
and random) prefer Type III tests over Type II test.
Hope that helps,
Henrik
PS: It always helps to include some example data.
Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
Hi everyone, I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form y~1+x1+x2+x1:x2+(by-subject random effects), where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects. I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test". The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging. My Qs: Given the above implementation of the model,... 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)? 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2? 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options? Thanks for the help, Thomas ???[1]: https://arxiv.org/pdf/1405.2094.pdf "here" [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi Henrik, hi Thomas, There is one small additional footnote to add to the Type-II/Type-III comparison: Type-II tests respect the principle of marginality, while Type-III tests do not. This is discussed somewhat in John Fox's Applied Regression textbook and is one of the main points in Venables' Exegeses on Linear Models. https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf In other words: Type-III tests allow you to test main effects as if the interaction were not there, while Type-II tests incorporate both the interaction and main effect into the test for the main effect. When the data are balanced, this are the same. (If I have misunderstood this point or formulated it infelicitously, I am happy to be corrected!)? Best, Phillip
On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
Hi Thomas,
As afex is mentioned, let me weigh in with my thoughts (though I am
as?
always happy for corrections).
First of all, afex implements basically the procedure described in
the?
document by Roger Levy, when using the same type of tests (i.e.,?
likelihood ratio tests). Using the second example in the document we
get:
require(mvtnorm)
require(afex)
set.seed(1)
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
????????????????????Y=factor(c("y1","y2","y3")),
????????????????????subj=factor(paste("S",1:M)),
????????????????????item=factor(paste("I",1:M)))
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
????????????????????????b.I[cbind(item,XY)] +
rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
????????????dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
anova(m0,m1)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
#????Df?????AIC?????BIC logLik deviance??Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28??-1136.6
# m1 49 -1038.7 -805.35 568.33??-1136.7 0.0999??????1??????0.752
(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
???????????????method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
#
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
#???Effect df?????Chisq p.value
# 1??????X??1??????0.10?????.75
# 2??????Y??2 66.06 ***??<.0001
# 3????X:Y??2 87.90 ***??<.0001
# ---
# Signif. codes:??0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?+? 0.1 ? ? 1
As can be seen, the test of x is identical with a chi-square value
of?
0.0999 (rounded to 0.10) and a p-value of 0.75.
We can also see that the intercept correspond exactly to the grand
mean:
fixef(mm1$full_model)[1]
# (Intercept)
#????3.006103
mean(dat$Response)
# [1] 3.006103
Your question is now what happens for unbalanced data. Type III
tests?
(as implemented here and in the document) assume the imbalance is
random?
and not structural (i.e., data is missing completely at random). This
is?
achieved by weighting the cells equally and not the data points. In?
other words, we simply do the analysis assuming each cell (i.e.,?
combination of factors) had the same size. As a consequence, the
grand?
mean does not correspond to the grand mean of the data, but to the?
unweighted mean of the cells. (Note that this only holds
approximately?
for mixed models, as the random effects also play a role here. For?
standard ANOVAs this is exactly true.)
An example can show this. We first remove one data point and then
fit?
the data again (with a very reduced model as we are only interested
in?
the intercept estimate here):
dat2 <- dat[-1,]
mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
fixef(mm2)[1]
# (Intercept)
#????3.006236
mean(dat2$Response)
# [1] 3.008559
mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254
As can be seen, the intercept corresponds approximately to the?
unweighted grand mean of all cells, but not to the grand mean of the
data.
As said above, the small difference between the intercept and the
grand?
mean are a consequence of the random effects. Change the random
effects?
structure to the one above and you will see even more differences.?
Furthermore, for standard ANOVA Type III tests exactly correspond to
the?
unweighted grand mean
What this tells us is that Type III tests are only appropriate if
the?
missing is not structural but random. In other words, the treatment?
(i.e., condition levels) should not be the cause for the missing
data?
for Type III tests. Type III test correct for missing data by simply?
treating the data as having no missing data (i.e., all cells are
weighed?
equally).
If the missing were structural (i.e., a consequence of the factor?
levels) this approach is of course not appropriate. Then we would
want?
to use something like Type II tests or what you describe in which
the?
group sizes are taken into account and the intercept correspond to
the?
grand mean. For example, for observational data where group sizes
are?
very often informative Type II tests seem more appropriate.
I expect this difference in how Type II and Type III tests deal with?
imbalance is also one of the reasons why those statisticans that work
a?
lot with observational data prefer Type II over Type III tests
whereas?
those that work with experimental data (where imbalance is usually
small?
and random) prefer Type III tests over Type II test.
Hope that helps,
Henrik
PS: It always helps to include some example data.
Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
Hi everyone, I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form y~1+x1+x2+x1:x2+(by-subject random effects), where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects. I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test". The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging. My Qs: Given the above implementation of the model,... 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)? 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2? 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options? Thanks for the help, Thomas ???[1]: https://arxiv.org/pdf/1405.2094.pdf "here" [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
4 days later
Hi Henrik and Phillip, thanks a lot for the detailed answers and making use of the example to explain. This has already been very useful and helped to clear up what I think were misunderstandings on my part. I have been playing around with the example some more and for my current purposes, I think this is enough. I do still have loads of further questions about this issue in general, though, and my brain normally does not like to let go of questions it has stumbled upon. So, should I find the time to get my Qs in order and do some more reading-up first, I hope it?s ok, if I come back to ask them. Best, Thomas
Am 06.06.2017 um 11:34 schrieb Alday, Phillip <Phillip.Alday at mpi.nl>: Hi Henrik, hi Thomas, There is one small additional footnote to add to the Type-II/Type-III comparison: Type-II tests respect the principle of marginality, while Type-III tests do not. This is discussed somewhat in John Fox's Applied Regression textbook and is one of the main points in Venables' Exegeses on Linear Models. https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf In other words: Type-III tests allow you to test main effects as if the interaction were not there, while Type-II tests incorporate both the interaction and main effect into the test for the main effect. When the data are balanced, this are the same. (If I have misunderstood this point or formulated it infelicitously, I am happy to be corrected!) Best, Phillip On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
Hi Thomas,
As afex is mentioned, let me weigh in with my thoughts (though I am
as
always happy for corrections).
First of all, afex implements basically the procedure described in
the
document by Roger Levy, when using the same type of tests (i.e.,
likelihood ratio tests). Using the second example in the document we
get:
require(mvtnorm)
require(afex)
set.seed(1)
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
Y=factor(c("y1","y2","y3")),
subj=factor(paste("S",1:M)),
item=factor(paste("I",1:M)))
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
b.I[cbind(item,XY)] +
rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
anova(m0,m1)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
# Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28 -1136.6
# m1 49 -1038.7 -805.35 568.33 -1136.7 0.0999 1 0.752
(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
#
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
# Effect df Chisq p.value
# 1 X 1 0.10 .75
# 2 Y 2 66.06 *** <.0001
# 3 X:Y 2 87.90 *** <.0001
# ---
# Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?+? 0.1 ? ? 1
As can be seen, the test of x is identical with a chi-square value
of
0.0999 (rounded to 0.10) and a p-value of 0.75.
We can also see that the intercept correspond exactly to the grand
mean:
fixef(mm1$full_model)[1]
# (Intercept)
# 3.006103
mean(dat$Response)
# [1] 3.006103
Your question is now what happens for unbalanced data. Type III
tests
(as implemented here and in the document) assume the imbalance is
random
and not structural (i.e., data is missing completely at random). This
is
achieved by weighting the cells equally and not the data points. In
other words, we simply do the analysis assuming each cell (i.e.,
combination of factors) had the same size. As a consequence, the
grand
mean does not correspond to the grand mean of the data, but to the
unweighted mean of the cells. (Note that this only holds
approximately
for mixed models, as the random effects also play a role here. For
standard ANOVAs this is exactly true.)
An example can show this. We first remove one data point and then
fit
the data again (with a very reduced model as we are only interested
in
the intercept estimate here):
dat2 <- dat[-1,]
mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
fixef(mm2)[1]
# (Intercept)
# 3.006236
mean(dat2$Response)
# [1] 3.008559
mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254
As can be seen, the intercept corresponds approximately to the
unweighted grand mean of all cells, but not to the grand mean of the
data.
As said above, the small difference between the intercept and the
grand
mean are a consequence of the random effects. Change the random
effects
structure to the one above and you will see even more differences.
Furthermore, for standard ANOVA Type III tests exactly correspond to
the
unweighted grand mean
What this tells us is that Type III tests are only appropriate if
the
missing is not structural but random. In other words, the treatment
(i.e., condition levels) should not be the cause for the missing
data
for Type III tests. Type III test correct for missing data by simply
treating the data as having no missing data (i.e., all cells are
weighed
equally).
If the missing were structural (i.e., a consequence of the factor
levels) this approach is of course not appropriate. Then we would
want
to use something like Type II tests or what you describe in which
the
group sizes are taken into account and the intercept correspond to
the
grand mean. For example, for observational data where group sizes
are
very often informative Type II tests seem more appropriate.
I expect this difference in how Type II and Type III tests deal with
imbalance is also one of the reasons why those statisticans that work
a
lot with observational data prefer Type II over Type III tests
whereas
those that work with experimental data (where imbalance is usually
small
and random) prefer Type III tests over Type II test.
Hope that helps,
Henrik
PS: It always helps to include some example data.
Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
Hi everyone, I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form y~1+x1+x2+x1:x2+(by-subject random effects), where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects. I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test". The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging. My Qs: Given the above implementation of the model,... 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)? 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2? 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options? Thanks for the help, Thomas [1]: https://arxiv.org/pdf/1405.2094.pdf "here" [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models