Replicating type III anova tests for glmer/GLMM

Yes. An ANOVA with my final bglmer model yields:
anova(recallmodel4x6a)
Analysis of Variance Table

                   Df Sum Sq Mean Sq F value
syntax12            1 1.7670  1.7670  1.7670
animacy12           1 3.4036  3.4036  3.4036
group123            2 5.7213  2.8607  2.8607
animacy12:group123  2 4.5546  2.2773  2.2773
syntax12:group123   2 8.1732  4.0866  4.0866

which is counterintuitively not what the authors of the papers
apparently used to generate coefficients to report their main effects
and interactions. It looks to me more like ML fitting. Elsewhere,
and more typically, main effects and interactions are obtained by comparing
a

model with the main fixed effect to a model without the

main fixed effect in terms of log-likelihood ratio tests

(Raffray et al., 2013, http://dx.doi.org/10.1016/j.jml.2013.09.004, p.6).

I understand obtaining p-values from a summary
of linear mixed models fit by lmer is a contentious issue

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

but I guess I might be missing something here.

On Tue, Feb 23, 2016 at 2:21 AM, Phillip Alday <Phillip.Alday at unisa.edu.au>
wrote:
Have you looked at car::Anova() ?

Best,
Phillip

[forgot to cc the list]

On 23 Feb 2016, at 11:42, Francesco Romano <
francescobryanromano at gmail.com> wrote:
Dear all,

I'm trying to report my analysis replicating the method in the following
papers:

Cai, Pickering, and Branigan (2012). Mapping concepts to syntax: Evidence
from structural priming in Mandarin Chinese. Journal of Memory and
Language 66
(2012) 833?849. (looking at pg. 842, "Combined analysis of Experiments 1
and 2" section)

Filiaci, Sorace, and Carreiras (2013). Anaphoric biases of null and overt
subjects in Italian and Spanish: a cross-linguistic comparison. Language,
Cognition, and Neuroscience  DOI:10.1080/01690965.2013.801502  (looking
at
pg.11, first two paragraphs)

This is because I have a glmer model with three fixed effects, two random
intercepts modeling a binary outcome, exactly as in the articles
mentioned.
The difficulty I'm finding is with locating information on commands
generating coefficients, SE, z, and p values (e.g. maximum likelihood
(Laplace Approximation)) to report main effects and interactions with the
anova or afex:mixed commands, following application of effect coding. I
have looked in several places, including Ben Bolker's FAQ
http://glmm.wikidot.com/faq and past posts on the topic in this r-sig.
Although there appears to be a plethora of material for lmer, I can't
seem
to locate anything in the right direction for glmer.

Many thanks for any help.

--
Frank Romano Ph.D.

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Frank Romano Ph.D.

Tel. +39 3911639149

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano

	[[alternative HTML version deleted]]
lme4:anova() is not the same thing as car::Anova()!

A quick R note that might have avoided the confusion:
The :: syntax in R refers to scope, so you can specify a function
unambiguously via package::function.name(). Moreover, R is case
sensitive, so Anova() and anova() are generally different things.

Henrik's message (posted to the list so if you don't suscribe, you need
to look here:
https://mailman.stat.ethz.ch/pipermail/r-sig-mixed-models/2016q1/024465.html
) describes how to do this with either his afex package (for
likelihood-ratio tests) or John Fox's car package (for analysis of
deviance / Wald tests).

If you just want to perform likelihood-ratio tests in lme4, then you
should look at the drop1() function or you can use anova(reduced.model,
full.model). Henrik also does a nice job summarizing some of the issues
here, so I won't repeat them.

One final note: not everything that holds for normal LMM holds for GLMM
-- GLMM tends to be much more complicated. :-(

Best,
Phillip
Yes. An ANOVA with my final bglmer model yields:

anova(recallmodel4x6a)
Analysis of Variance Table

                   Df Sum Sq Mean Sq F value
syntax12            1 1.7670  1.7670  1.7670
animacy12           1 3.4036  3.4036  3.4036
group123            2 5.7213  2.8607  2.8607
animacy12:group123  2 4.5546  2.2773  2.2773
syntax12:group123   2 8.1732  4.0866  4.0866

which is counterintuitively not what the authors of the papers 
apparently used to generate coefficients to report their main effects 
and interactions. It looks to me more like ML fitting. Elsewhere, 
and more typically, main effects and interactions are obtained by
comparing a

model with the main fixed effect to a model without the

main fixed effect in terms of log-likelihood ratio tests 

(Raffray et al., 2013, http://dx.doi.org/10.1016/j.jml.2013.09.004, p.6).

I understand obtaining p-values from a summary
of linear mixed models fit by lmer is a contentious issue

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

but I guess I might be missing something here.

On Tue, Feb 23, 2016 at 2:21 AM, Phillip Alday
<Phillip.Alday at unisa.edu.au <mailto:Phillip.Alday at unisa.edu.au>> wrote:

    Have you looked at car::Anova() ?

    Best,
    Phillip

    [forgot to cc the list]

    > On 23 Feb 2016, at 11:42, Francesco Romano <francescobryanromano at gmail.com
    <mailto:francescobryanromano at gmail.com>> wrote:
    >
    > Dear all,
    >
    > I'm trying to report my analysis replicating the method in the
    following
    > papers:
    >
    > Cai, Pickering, and Branigan (2012). Mapping concepts to syntax:
    Evidence
    > from structural priming in Mandarin Chinese. Journal of Memory and
    Language 66
    > (2012) 833?849 <tel:%282012%29%20833%E2%80%93849>. (looking at pg.
    842, "Combined analysis of Experiments 1
    > and 2" section)
    >
    > Filiaci, Sorace, and Carreiras (2013). Anaphoric biases of null
    and overt
    > subjects in Italian and Spanish: a cross-linguistic comparison.
    Language,
    > Cognition, and Neuroscience  DOI:10.1080/01690965.2013.801502 
    (looking at
    > pg.11, first two paragraphs)
    >
    > This is because I have a glmer model with three fixed effects, two
    random
    > intercepts modeling a binary outcome, exactly as in the articles
    mentioned.
    >
    > The difficulty I'm finding is with locating information on commands
    > generating coefficients, SE, z, and p values (e.g. maximum likelihood
    > (Laplace Approximation)) to report main effects and interactions
    with the
    > anova or afex:mixed commands, following application of effect
    coding. I
    > have looked in several places, including Ben Bolker's FAQ
    > http://glmm.wikidot.com/faq and past posts on the topic in this r-sig.
    > Although there appears to be a plethora of material for lmer, I
    can't seem
    > to locate anything in the right direction for glmer.
    >
    > Many thanks for any help.
    >
    >
    >
    >
    > --
    > Frank Romano Ph.D.
    >
    > *LinkedIn*
    > https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162
    >
    > *Academia.edu*
    > https://sheffield.academia.edu/FrancescoRomano
    >
    >       [[alternative HTML version deleted]]
    >
    > _______________________________________________
    > R-sig-mixed-models at r-project.org
    <mailto:R-sig-mixed-models at r-project.org> mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Frank Romano Ph.D.

Tel. +39 3911639149

/LinkedIn/
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

/Academia.edu/
https://sheffield.academia.edu/FrancescoRomano
Thanks to Henrik and Phillip for the quick reply.
Your suggestions have been helpful in making progress.

On the one hand Henrik is right about
reporting coefficients and standard errors when
there are only two levels for the each predictor. This is
consistent with two of the sources I mentioned so far.
I infer that the authors reported directly from the summary(m1)
after use of the mixed function (not car::Anova which yields chi
square tests).

On the other hand, I don't understand how Cai et al. (2012) p.842,
"combined analysis experiments 1 and 2", reported the main effect
of a factor with 4 levels via a single estimate, SE, z, p coefficient.
How did they obtain this and is this the right way?

Finally, after running analysis both ways, I get slightly different
p-values, with the car::Anova method being more conservative
(it yields less significant predictors). Is this normal?

Frank

On Tue, Feb 23, 2016 at 10:51 AM, Phillip Alday <Phillip.Alday at unisa.edu.au>
wrote:
lme4:anova() is not the same thing as car::Anova()!

A quick R note that might have avoided the confusion:
The :: syntax in R refers to scope, so you can specify a function
unambiguously via package::function.name(). Moreover, R is case
sensitive, so Anova() and anova() are generally different things.

Henrik's message (posted to the list so if you don't suscribe, you need
to look here:

https://mailman.stat.ethz.ch/pipermail/r-sig-mixed-models/2016q1/024465.html
) describes how to do this with either his afex package (for
likelihood-ratio tests) or John Fox's car package (for analysis of
deviance / Wald tests).

If you just want to perform likelihood-ratio tests in lme4, then you
should look at the drop1() function or you can use anova(reduced.model,
full.model). Henrik also does a nice job summarizing some of the issues
here, so I won't repeat them.

One final note: not everything that holds for normal LMM holds for GLMM
-- GLMM tends to be much more complicated. :-(

Best,
Phillip

On 23/02/16 20:03, Francesco Romano wrote:
Yes. An ANOVA with my final bglmer model yields:

anova(recallmodel4x6a)
Analysis of Variance Table

                   Df Sum Sq Mean Sq F value
syntax12            1 1.7670  1.7670  1.7670
animacy12           1 3.4036  3.4036  3.4036
group123            2 5.7213  2.8607  2.8607
animacy12:group123  2 4.5546  2.2773  2.2773
syntax12:group123   2 8.1732  4.0866  4.0866

which is counterintuitively not what the authors of the papers
apparently used to generate coefficients to report their main effects
and interactions. It looks to me more like ML fitting. Elsewhere,
and more typically, main effects and interactions are obtained by
comparing a

model with the main fixed effect to a model without the

main fixed effect in terms of log-likelihood ratio tests

(Raffray et al., 2013, http://dx.doi.org/10.1016/j.jml.2013.09.004,
p.6).

I understand obtaining p-values from a summary
of linear mixed models fit by lmer is a contentious issue

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

but I guess I might be missing something here.

On Tue, Feb 23, 2016 at 2:21 AM, Phillip Alday
<Phillip.Alday at unisa.edu.au <mailto:Phillip.Alday at unisa.edu.au>> wrote:

    Have you looked at car::Anova() ?

    Best,
    Phillip

    [forgot to cc the list]

    > On 23 Feb 2016, at 11:42, Francesco Romano <
francescobryanromano at gmail.com
    <mailto:francescobryanromano at gmail.com>> wrote:
    >
    > Dear all,
    >
    > I'm trying to report my analysis replicating the method in the
    following
    > papers:
    >
    > Cai, Pickering, and Branigan (2012). Mapping concepts to syntax:
    Evidence
    > from structural priming in Mandarin Chinese. Journal of Memory and
    Language 66
    > (2012) 833?849 <tel:%282012%29%20833%E2%80%93849>. (looking at pg.
    842, "Combined analysis of Experiments 1
    > and 2" section)
    >
    > Filiaci, Sorace, and Carreiras (2013). Anaphoric biases of null
    and overt
    > subjects in Italian and Spanish: a cross-linguistic comparison.
    Language,
    > Cognition, and Neuroscience  DOI:10.1080/01690965.2013.801502
    (looking at
    > pg.11, first two paragraphs)
    >
    > This is because I have a glmer model with three fixed effects, two
    random
    > intercepts modeling a binary outcome, exactly as in the articles
    mentioned.
    >
    > The difficulty I'm finding is with locating information on commands
    > generating coefficients, SE, z, and p values (e.g. maximum
likelihood
    > (Laplace Approximation)) to report main effects and interactions
    with the
    > anova or afex:mixed commands, following application of effect
    coding. I
    > have looked in several places, including Ben Bolker's FAQ
    > http://glmm.wikidot.com/faq and past posts on the topic in this
r-sig.
    > Although there appears to be a plethora of material for lmer, I
    can't seem
    > to locate anything in the right direction for glmer.
    >
    > Many thanks for any help.
    >
    >
    >
    >
    > --
    > Frank Romano Ph.D.
    >
    > *LinkedIn*
    > https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162
    >
    > *Academia.edu*
    > https://sheffield.academia.edu/FrancescoRomano
    >
    >       [[alternative HTML version deleted]]
    >
    > _______________________________________________
    > R-sig-mixed-models at r-project.org
    <mailto:R-sig-mixed-models at r-project.org> mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Frank Romano Ph.D.

Tel. +39 3911639149

/LinkedIn/
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

/Academia.edu/
https://sheffield.academia.edu/FrancescoRomano

Frank Romano Ph.D.

Tel. +39 3911639149

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano

	[[alternative HTML version deleted]]
In my experience, car::Anova is slightly less conservative (as Wald
tests are known to be somewhat anti-conservative).

Are you using Type-III tests for everything? The differences between
Type-II and Type-III can actually make a big difference in terms of
which predictors are significant.

Speaking of Type-III -- although it's the default in some popular
commercial packages, Type-II (marginal tests) is actually the type that
makes the most sense in terms of statistical interpretation and
hypotheses tested. But that's a topic for another time ....

Best,
Phillip
Thanks to Henrik and Phillip for the quick reply.
Your suggestions have been helpful in making progress.

On the one hand Henrik is right about
reporting coefficients and standard errors when
there are only two levels for the each predictor. This is
consistent with two of the sources I mentioned so far.
I infer that the authors reported directly from the summary(m1)
after use of the mixed function (not car::Anova which yields chi
square tests).

On the other hand, I don't understand how Cai et al. (2012) p.842,
"combined analysis experiments 1 and 2", reported the main effect
of a factor with 4 levels via a single estimate, SE, z, p coefficient.
How did they obtain this and is this the right way?

Finally, after running analysis both ways, I get slightly different
p-values, with the car::Anova method being more conservative
(it yields less significant predictors). Is this normal?

Frank

On Tue, Feb 23, 2016 at 10:51 AM, Phillip Alday <Phillip.Alday at unisa.edu.au>
wrote:

lme4:anova() is not the same thing as car::Anova()!

A quick R note that might have avoided the confusion:
The :: syntax in R refers to scope, so you can specify a function
unambiguously via package::function.name(). Moreover, R is case
sensitive, so Anova() and anova() are generally different things.

Henrik's message (posted to the list so if you don't suscribe, you need
to look here:

https://mailman.stat.ethz.ch/pipermail/r-sig-mixed-models/2016q1/024465.html
) describes how to do this with either his afex package (for
likelihood-ratio tests) or John Fox's car package (for analysis of
deviance / Wald tests).

If you just want to perform likelihood-ratio tests in lme4, then you
should look at the drop1() function or you can use anova(reduced.model,
full.model). Henrik also does a nice job summarizing some of the issues
here, so I won't repeat them.

One final note: not everything that holds for normal LMM holds for GLMM
-- GLMM tends to be much more complicated. :-(

Best,
Phillip

On 23/02/16 20:03, Francesco Romano wrote:
Yes. An ANOVA with my final bglmer model yields:

anova(recallmodel4x6a)
Analysis of Variance Table

                   Df Sum Sq Mean Sq F value
syntax12            1 1.7670  1.7670  1.7670
animacy12           1 3.4036  3.4036  3.4036
group123            2 5.7213  2.8607  2.8607
animacy12:group123  2 4.5546  2.2773  2.2773
syntax12:group123   2 8.1732  4.0866  4.0866

which is counterintuitively not what the authors of the papers
apparently used to generate coefficients to report their main effects
and interactions. It looks to me more like ML fitting. Elsewhere,
and more typically, main effects and interactions are obtained by
comparing a

model with the main fixed effect to a model without the

main fixed effect in terms of log-likelihood ratio tests

(Raffray et al., 2013, http://dx.doi.org/10.1016/j.jml.2013.09.004,
p.6).

I understand obtaining p-values from a summary
of linear mixed models fit by lmer is a contentious issue

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

but I guess I might be missing something here.

On Tue, Feb 23, 2016 at 2:21 AM, Phillip Alday
<Phillip.Alday at unisa.edu.au <mailto:Phillip.Alday at unisa.edu.au>> wrote:

    Have you looked at car::Anova() ?

    Best,
    Phillip

    [forgot to cc the list]

    > On 23 Feb 2016, at 11:42, Francesco Romano <
francescobryanromano at gmail.com
    <mailto:francescobryanromano at gmail.com>> wrote:
    >
    > Dear all,
    >
    > I'm trying to report my analysis replicating the method in the
    following
    > papers:
    >
    > Cai, Pickering, and Branigan (2012). Mapping concepts to syntax:
    Evidence
    > from structural priming in Mandarin Chinese. Journal of Memory and
    Language 66
    > (2012) 833?849 <tel:%282012%29%20833%E2%80%93849>. (looking at pg.
    842, "Combined analysis of Experiments 1
    > and 2" section)
    >
    > Filiaci, Sorace, and Carreiras (2013). Anaphoric biases of null
    and overt
    > subjects in Italian and Spanish: a cross-linguistic comparison.
    Language,
    > Cognition, and Neuroscience  DOI:10.1080/01690965.2013.801502
    (looking at
    > pg.11, first two paragraphs)
    >
    > This is because I have a glmer model with three fixed effects, two
    random
    > intercepts modeling a binary outcome, exactly as in the articles
    mentioned.
    >
    > The difficulty I'm finding is with locating information on commands
    > generating coefficients, SE, z, and p values (e.g. maximum
likelihood
    > (Laplace Approximation)) to report main effects and interactions
    with the
    > anova or afex:mixed commands, following application of effect
    coding. I
    > have looked in several places, including Ben Bolker's FAQ
    > http://glmm.wikidot.com/faq and past posts on the topic in this
r-sig.
    > Although there appears to be a plethora of material for lmer, I
    can't seem
    > to locate anything in the right direction for glmer.
    >
    > Many thanks for any help.
    >
    >
    >
    >
    > --
    > Frank Romano Ph.D.
    >
    > *LinkedIn*
    > https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162
    >
    > *Academia.edu*
    > https://sheffield.academia.edu/FrancescoRomano
    >
    >       [[alternative HTML version deleted]]
    >
    > _______________________________________________
    > R-sig-mixed-models at r-project.org
    <mailto:R-sig-mixed-models at r-project.org> mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Frank Romano Ph.D.

Tel. +39 3911639149

/LinkedIn/
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

/Academia.edu/
https://sheffield.academia.edu/FrancescoRomano

? On the other hand, I don't understand how Cai et al. (2012) p.842,
? "combined analysis experiments 1 and 2", reported the main effect
? of a factor with 4 levels via a single estimate, SE, z, p coefficient.
? How did they obtain this and is this the right way?

It's just a guess, but any sum-of-square can be seen as a particular
contrast, that is a particular combination of the coefficients in the
model (or of the different means, expressed another way) that is
tested against 0. So I guess this single estimate is the value of the
contrast associated to the corresponding sum-of-squares, and SE/z/p
are derived similarly.

You can play with multcomp::glht to test this, but knowing which
contrast is tested by which sum of square in a specific desing may be
tricky: it depends on the coding, on the (un)balance...

Kowing if this is the ? right ? way is I think the same debate that
knowing which kind of sum-of-square should be used and the question is
very application dependent. Just, if you don't know what this single
estimate estimates really, interpretation is at best difficult...
Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
Dear Emmanuel,

With proper contrast coding (i.e., a coding that's orthogonal in the *basis* of the design, such as provided by contr.sum() ), a "type-III" test is just a test that the corresponding parameters are 0. The models in question are generalized linear (mixed) models and so sums of squares aren't really involved, but one could do the corresponding Wald (like car::Anova) or LR test. The Wald test is what you'd get with multcomp:glht or car:linearHypothesis. BTW, I don't think that it would be hard for car::Anova to be extended to provide LR tests in this case.

Best,
 John
-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-
project.org] On Behalf Of Emmanuel Curis
Sent: February 23, 2016 10:32 AM
To: Francesco Romano <francescobryanromano at gmail.com>
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM

On Tue, Feb 23, 2016 at 01:06:18PM +0100, Francesco Romano wrote:
< On the other hand, I don't understand how Cai et al. (2012) p.842, <
"combined analysis experiments 1 and 2", reported the main effect < of a
factor with 4 levels via a single estimate, SE, z, p coefficient.
< How did they obtain this and is this the right way?

It's just a guess, but any sum-of-square can be seen as a particular contrast,
that is a particular combination of the coefficients in the model (or of the
different means, expressed another way) that is tested against 0. So I guess
this single estimate is the value of the contrast associated to the
corresponding sum-of-squares, and SE/z/p are derived similarly.

You can play with multcomp::glht to test this, but knowing which contrast is
tested by which sum of square in a specific desing may be
tricky: it depends on the coding, on the (un)balance...

Kowing if this is the < right > way is I think the same debate that knowing
which kind of sum-of-square should be used and the question is very
application dependent. Just, if you don't know what this single estimate
estimates really, interpretation is at best difficult...

--
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Pr Fox,

Thanks for your precision. But to summarize this test of, let's say 3
parameters to 0 for a 4-levels factor, by a single value with its SE,
as mentionned in Francesco's mail, the linear combination of these
parameters that is practically tested by this sum of square is needed,
isn't it ?

I mean, if really the parameters are all 0, whatever linear
combination could do the job, but type III sum of square just tests
one of all possible linear combinations, right?

By the way, I was always very annoyed by the fact that Type III sum of
squares are so dependent on coding, but that's another debate...

Best regards,
? Dear Emmanuel,
? 
? With proper contrast coding (i.e., a coding that's orthogonal in the *basis* of the design, such as provided by contr.sum() ), a "type-III" test is just a test that the corresponding parameters are 0. The models in question are generalized linear (mixed) models and so sums of squares aren't really involved, but one could do the corresponding Wald (like car::Anova) or LR test. The Wald test is what you'd get with multcomp:glht or car:linearHypothesis. BTW, I don't think that it would be hard for car::Anova to be extended to provide LR tests in this case.
? 
? Best,
?  John
Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
Dear Emmanuel,

First, the relevant linear hypothesis is for several coefficients simultaneously -- for example, all 3 coefficients for the contrasts representing a 4-level factor -- not for a single contrast. Although it's true that any linear combination of parameters that are 0 is 0, the converse isn't true. Second, for a GLMM, we really should be talking about type-III tests not type-III sums of squares.

Type-III tests are dependent on coding in the full-rank parametrization of linear (and similar) models used in R, to make the tests correspond to reasonable hypotheses. The invariance of type-II tests with respect to coding is attractive, but shouldn't distract from the fundamental issues, which are the hypotheses that are tested and the power of the tests. 

Best,
 John
-----Original Message-----
From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr]
Sent: February 23, 2016 11:50 AM
To: Fox, John <jfox at mcmaster.ca>
Cc: Francesco Romano <francescobryanromano at gmail.com>; r-sig-mixed-
models at r-project.org
Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM

Dear Pr Fox,

Thanks for your precision. But to summarize this test of, let's say 3
parameters to 0 for a 4-levels factor, by a single value with its SE, as
mentionned in Francesco's mail, the linear combination of these parameters
that is practically tested by this sum of square is needed, isn't it ?

I mean, if really the parameters are all 0, whatever linear combination could
do the job, but type III sum of square just tests one of all possible linear
combinations, right?

By the way, I was always very annoyed by the fact that Type III sum of
squares are so dependent on coding, but that's another debate...

Best regards,

On Tue, Feb 23, 2016 at 04:15:02PM +0000, Fox, John wrote:
< Dear Emmanuel,
<
< With proper contrast coding (i.e., a coding that's orthogonal in the *basis*
of the design, such as provided by contr.sum() ), a "type-III" test is just a test
that the corresponding parameters are 0. The models in question are
generalized linear (mixed) models and so sums of squares aren't really
involved, but one could do the corresponding Wald (like car::Anova) or LR
test. The Wald test is what you'd get with multcomp:glht or
car:linearHypothesis. BTW, I don't think that it would be hard for car::Anova
to be extended to provide LR tests in this case.
<
< Best,
<  John

--
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
John,

I tried the Anova() function in the car package implemented with
contr.sum() but it doesn't produce beta, SE, z, and p.
To be more precise, R requires that either the F or Chi sq statistic be
used. The model I used was termed "mod", here is the error:
Anova(mod, type=c("III"),
+     test.statistic=c("LR"))
Error in match.arg(test.statistic) : 'arg' should be one of ?Chisq?, ?F?

Chi square produces the following output:
Anova(mod, type=c("III"),
+     test.statistic=c("Chisq"))
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: Correct
                              Chisq Df Pr(>Chisq)
(Intercept)                 67.7409  1  < 2.2e-16 ***
Syntax                       0.2856  1   0.593083
Animacy                      6.2575  1   0.012367 *
Prof.group.2                 2.9888  2   0.224379
Syntax:Animacy               0.0970  1   0.755521
Syntax:Prof.group.2          9.3054  2   0.009536 **
Animacy:Prof.group.2         4.7633  2   0.092399 .
Syntax:Animacy:Prof.group.2  1.3704  2   0.503997

So I still don't know how Raffrey et al. reported beta, SE, z, and p for a
main effect of factor with 4 levels.
If reviewers ask me to do this, I will argue that reporting chi square
tests with corresponding p-values is
a more accurate way of reporting main effects and interactions.

If I haven't abused enough of your time, it would be beneficial to
understand which of the two
methods suggested by Henrik I should adopt. I attach my data.

The predictors of interest are Syntax (2 levels), Animacy (2 levels),
Prof.group.2 (3 levels),
and the outcome 'correct', while the random effects are 'Part.name' and
'Item'. The best model fit is a
bglmer with glmerControl(optimizer = "bobyqa") and nAGQ=1
summary(recallmodel4bisB3)
Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf, posterior.scale =
cov, common.scale = TRUE)
           : Item ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov,
common.scale = TRUE)
Prior dev  : 1.3565

Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['bglmerMod']
 Family: binomial  ( logit )
Formula: Correct ~ Syntax * Animacy * Prof.group.2 + (1 | Part.name) +
 (1 | Item)
   Data: recall
Control: glmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid
   313.3    372.9   -142.6    285.3      509

Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.3517 -0.2926 -0.1802 -0.1137  9.3666

Random effects:
 Groups    Name        Variance Std.Dev.
 Part.name (Intercept) 0.8046   0.8970
 Item      (Intercept) 0.5031   0.7093
Number of obs: 523, groups:  Part.name, 42; Item, 16

Fixed effects:
                                       Estimate Std. Error z value Pr(>|z|)

(Intercept)                             -0.8960     0.6317  -1.418 0.156071

Syntaxs                                 -2.0713     0.9447  -2.193 0.028342
*
Animacy+AN -AN                          -3.0539     1.2548  -2.434 0.014941
*
Prof.group.2int                         -2.5594     0.9473  -2.702 0.006898
**
Prof.group.2ns                          -1.8673     0.7634  -2.446 0.014442
*
Syntaxs:Animacy+AN -AN                   1.8642     1.8202   1.024 0.305750

Syntaxs:Prof.group.2int                  4.1704     1.1676   3.572 0.000355
***
Syntaxs:Prof.group.2ns                   2.4244     1.0483   2.313 0.020736
*
Animacy+AN -AN:Prof.group.2int           3.0067     1.5528   1.936 0.052824
.
Animacy+AN -AN:Prof.group.2ns            1.3245     1.6071   0.824 0.409848

Syntaxs:Animacy+AN -AN:Prof.group.2int  -2.2056     2.0550  -1.073 0.283162

Syntaxs:Animacy+AN -AN:Prof.group.2ns   -2.3249     2.3108  -1.006 0.314360

---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Henrik's first method via afex:mixed leads to:
m4<-mixed(recallmodel4bisB3, data = recall, family = binomial, method =
"LRT")
Formula (the first argument) converted to formula.
Fitting 8 (g)lmer() models:
(8 warnings omitted)
anova(m4)
Mixed Model Anova Table (Type 3 tests)

Model: Correct ~ Syntax * Animacy * Prof.group.2 + (1 | Part.name) +
Model:     (1 | Item)
Data: recall
Df full model: 14
                            Df   Chisq Chi Df Pr(>Chisq)
Syntax                      13  5.5659      1   0.018313 *
Animacy                     13  8.4710      1   0.003609 **
Prof.group.2                12 10.5099      2   0.005222 **
Syntax:Animacy              13  0.9832      1   0.321400
Syntax:Prof.group.2         12 15.8094      2   0.000369 ***
Animacy:Prof.group.2        12  3.9188      2   0.140945
Syntax:Animacy:Prof.group.2 12  1.2240      2   0.542272
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

The result is a main effect of Syntax, Animacy, Prof.group.2, and
interaction
between Syntax and Prof.Group.2. The summary(m4) is perfectly interpretable.

Henrik's second method yields:

*set contrasts*
Syntax01 <- as.factor(1*(recall$Syntax=="of") + 2*(recall$Syntax=="''s"))
Animacy01 <- as.factor(1*(recall$Animacy=="-AN +AN") +
2*(recall$Animacy=="+AN -AN"))
Group012 <- as.factor(1*(recall$Prof.group.2=="adv") +
2*(recall$Prof.group.2=="int") + 3*(recall$Prof.group.2=="ns"))
contrasts(Syntax01) <- contr.sum
contrasts(Animacy01) <- contr.sum
contrasts(Group012) <- contr.sum
*try second method*
m5 <- bglmer (Correct~Syntax01 * Animacy01 * Group012 + (1 | Part.name) +
(1 | Item), data = recall, control = glmerControl(optimizer =
"bobyqa"), nAGQ=1, family=binomial, expand_re= T)
Warning message:
extra argument(s) ?expand_re? disregarded
car::Anova(m5, type = 3)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: Correct
                              Chisq Df Pr(>Chisq)
(Intercept)                 67.7409  1  < 2.2e-16 ***
Syntax01                     0.2856  1   0.593083
Animacy01                    6.2575  1   0.012367 *
Group012                     2.9888  2   0.224379
Syntax01:Animacy01           0.0970  1   0.755521
Syntax01:Group012            9.3054  2   0.009536 **
Animacy01:Group012           4.7633  2   0.092399 .
Syntax01:Animacy01:Group012  1.3704  2   0.503997

The result this time is a main effect of what was Animacy and interaction
between what was Syntax and Prof.Group.2 ?!

The summary(m5) is perfectly interpretable.

Dear Emmanuel,

First, the relevant linear hypothesis is for several coefficients
simultaneously -- for example, all 3 coefficients for the contrasts
representing a 4-level factor -- not for a single contrast. Although it's
true that any linear combination of parameters that are 0 is 0, the
converse isn't true. Second, for a GLMM, we really should be talking about
type-III tests not type-III sums of squares.

Type-III tests are dependent on coding in the full-rank parametrization of
linear (and similar) models used in R, to make the tests correspond to
reasonable hypotheses. The invariance of type-II tests with respect to
coding is attractive, but shouldn't distract from the fundamental issues,
which are the hypotheses that are tested and the power of the tests.

Best,
 John

-----Original Message-----
From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr]
Sent: February 23, 2016 11:50 AM
To: Fox, John <jfox at mcmaster.ca>
Cc: Francesco Romano <francescobryanromano at gmail.com>; r-sig-mixed-
models at r-project.org
Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM

Dear Pr Fox,

Thanks for your precision. But to summarize this test of, let's say 3
parameters to 0 for a 4-levels factor, by a single value with its SE, as
mentionned in Francesco's mail, the linear combination of these
parameters
that is practically tested by this sum of square is needed, isn't it ?

I mean, if really the parameters are all 0, whatever linear combination
could
do the job, but type III sum of square just tests one of all possible
linear
combinations, right?

By the way, I was always very annoyed by the fact that Type III sum of
squares are so dependent on coding, but that's another debate...

Best regards,

On Tue, Feb 23, 2016 at 04:15:02PM +0000, Fox, John wrote:
< Dear Emmanuel,
<
< With proper contrast coding (i.e., a coding that's orthogonal in the
*basis*
of the design, such as provided by contr.sum() ), a "type-III" test is
just a test
that the corresponding parameters are 0. The models in question are
generalized linear (mixed) models and so sums of squares aren't really
involved, but one could do the corresponding Wald (like car::Anova) or LR
test. The Wald test is what you'd get with multcomp:glht or
car:linearHypothesis. BTW, I don't think that it would be hard for
car::Anova
to be extended to provide LR tests in this case.
<
< Best,
<  John

--
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html

Frank Romano Ph.D.

Tel. +39 3911639149

*LinkedIn*
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

*Academia.edu*
https://sheffield.academia.edu/FrancescoRomano
Dear Francesco,

For a 1-df test, the Wald chi-square is just Z^2, but the chi-square is more general. When a term in the model has more than 1 df, there is more than one beta (hat) and one SE (and covariances) for the coefficients in the term. If you want to see the individual coefficient estimates, then summary(mod) will show you each coefficient estimate, the SE for each estimate, Z, and p. Why one would want to look at the individual effect-coded coefficients and tests in this context escapes me. 

Best,
 John
-----Original Message-----
From: Francesco Romano [mailto:francescobryanromano at gmail.com]
Sent: February 23, 2016 2:34 PM
To: Fox, John <jfox at mcmaster.ca>
Cc: Emmanuel Curis <emmanuel.curis at parisdescartes.fr>; r-sig-mixed-
models at r-project.org
Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM

John,

I tried the Anova() function in the car package implemented with contr.sum()
but it doesn't produce beta, SE, z, and p.
To be more precise, R requires that either the F or Chi sq statistic be used.
The model I used was termed "mod", here is the error:

Anova(mod, type=c("III"),
+     test.statistic=c("LR"))
Error in match.arg(test.statistic) : 'arg' should be one of ?Chisq?, ?F?

Chi square produces the following output:

Anova(mod, type=c("III"),
+     test.statistic=c("Chisq"))
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: Correct
                              Chisq Df Pr(>Chisq)
(Intercept)                 67.7409  1  < 2.2e-16 ***
Syntax                       0.2856  1   0.593083
Animacy                      6.2575  1   0.012367 *
Prof.group.2                 2.9888  2   0.224379
Syntax:Animacy               0.0970  1   0.755521
Syntax:Prof.group.2          9.3054  2   0.009536 **
Animacy:Prof.group.2         4.7633  2   0.092399 .
Syntax:Animacy:Prof.group.2  1.3704  2   0.503997

So I still don't know how Raffrey et al. reported beta, SE, z, and p for a main
effect of factor with 4 levels.
If reviewers ask me to do this, I will argue that reporting chi square tests with
corresponding p-values is a more accurate way of reporting main effects and
interactions.

If I haven't abused enough of your time, it would be beneficial to understand
which of the two
methods suggested by Henrik I should adopt. I attach my data.

The predictors of interest are Syntax (2 levels), Animacy (2 levels),
Prof.group.2 (3 levels),
and the outcome 'correct', while the random effects are 'Part.name' and
'Item'. The best model fit is a
bglmer with glmerControl(optimizer = "bobyqa") and nAGQ=1

summary(recallmodel4bisB3)
Cov prior  : Part.name ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov,
common.scale = TRUE)
           : Item ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov,
common.scale = TRUE)
Prior dev  : 1.3565

Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['bglmerMod']
 Family: binomial  ( logit )
Formula: Correct ~ Syntax * Animacy * Prof.group.2 + (1 | Part.name) +      (1
| Item)
   Data: recall
Control: glmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid
   313.3    372.9   -142.6    285.3      509

Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.3517 -0.2926 -0.1802 -0.1137  9.3666

Random effects:
 Groups    Name        Variance Std.Dev.
 Part.name (Intercept) 0.8046   0.8970
 Item      (Intercept) 0.5031   0.7093
Number of obs: 523, groups:  Part.name, 42; Item, 16

Fixed effects:
                                       Estimate Std. Error z value Pr(>|z|)
(Intercept)                             -0.8960     0.6317  -1.418 0.156071
Syntaxs                                 -2.0713     0.9447  -2.193 0.028342 *
Animacy+AN -AN                          -3.0539     1.2548  -2.434 0.014941 *
Prof.group.2int                         -2.5594     0.9473  -2.702 0.006898 **
Prof.group.2ns                          -1.8673     0.7634  -2.446 0.014442 *
Syntaxs:Animacy+AN -AN                   1.8642     1.8202   1.024 0.305750
Syntaxs:Prof.group.2int                  4.1704     1.1676   3.572 0.000355 ***
Syntaxs:Prof.group.2ns                   2.4244     1.0483   2.313 0.020736 *
Animacy+AN -AN:Prof.group.2int           3.0067     1.5528   1.936 0.052824 .
Animacy+AN -AN:Prof.group.2ns            1.3245     1.6071   0.824 0.409848
Syntaxs:Animacy+AN -AN:Prof.group.2int  -2.2056     2.0550  -1.073 0.283162
Syntaxs:Animacy+AN -AN:Prof.group.2ns   -2.3249     2.3108  -1.006 0.314360
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Henrik's first method via afex:mixed leads to:

m4<-mixed(recallmodel4bisB3, data = recall, family = binomial, method =
"LRT")
Formula (the first argument) converted to formula.
Fitting 8 (g)lmer() models:
(8 warnings omitted)

anova(m4)
Mixed Model Anova Table (Type 3 tests)

Model: Correct ~ Syntax * Animacy * Prof.group.2 + (1 | Part.name) +
Model:     (1 | Item)
Data: recall
Df full model: 14
                            Df   Chisq Chi Df Pr(>Chisq)
Syntax                      13  5.5659      1   0.018313 *
Animacy                     13  8.4710      1   0.003609 **
Prof.group.2                12 10.5099      2   0.005222 **
Syntax:Animacy              13  0.9832      1   0.321400
Syntax:Prof.group.2         12 15.8094      2   0.000369 ***
Animacy:Prof.group.2        12  3.9188      2   0.140945
Syntax:Animacy:Prof.group.2 12  1.2240      2   0.542272
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

The result is a main effect of Syntax, Animacy, Prof.group.2, and interaction
between Syntax and Prof.Group.2. The summary(m4) is perfectly
interpretable.

Henrik's second method yields:

*set contrasts*
Syntax01 <- as.factor(1*(recall$Syntax=="of") + 2*(recall$Syntax=="''s"))

Animacy01 <- as.factor(1*(recall$Animacy=="-AN +AN") +
2*(recall$Animacy=="+AN -AN"))

Group012 <- as.factor(1*(recall$Prof.group.2=="adv") +
2*(recall$Prof.group.2=="int") + 3*(recall$Prof.group.2=="ns"))
contrasts(Syntax01) <- contr.sum

contrasts(Animacy01) <- contr.sum

contrasts(Group012) <- contr.sum

*try second method*
m5 <- bglmer (Correct~Syntax01 * Animacy01 * Group012 + (1 | Part.name)
+      (1 | Item), data = recall, control = glmerControl(optimizer = "bobyqa"),
nAGQ=1, family=binomial, expand_re= T)
Warning message:
extra argument(s) ?expand_re? disregarded
car::Anova(m5, type = 3)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: Correct
                              Chisq Df Pr(>Chisq)
(Intercept)                 67.7409  1  < 2.2e-16 ***
Syntax01                     0.2856  1   0.593083
Animacy01                    6.2575  1   0.012367 *
Group012                     2.9888  2   0.224379
Syntax01:Animacy01           0.0970  1   0.755521
Syntax01:Group012            9.3054  2   0.009536 **
Animacy01:Group012           4.7633  2   0.092399 .
Syntax01:Animacy01:Group012  1.3704  2   0.503997

The result this time is a main effect of what was Animacy and interaction
between what was Syntax and Prof.Group.2 ?!

The summary(m5) is perfectly interpretable.

On Tue, Feb 23, 2016 at 6:17 PM, Fox, John <jfox at mcmaster.ca
<mailto:jfox at mcmaster.ca> > wrote:

	Dear Emmanuel,

	First, the relevant linear hypothesis is for several coefficients
simultaneously -- for example, all 3 coefficients for the contrasts
representing a 4-level factor -- not for a single contrast. Although it's true
that any linear combination of parameters that are 0 is 0, the converse isn't
true. Second, for a GLMM, we really should be talking about type-III tests not
type-III sums of squares.

	Type-III tests are dependent on coding in the full-rank
parametrization of linear (and similar) models used in R, to make the tests
correspond to reasonable hypotheses. The invariance of type-II tests with
respect to coding is attractive, but shouldn't distract from the fundamental
issues, which are the hypotheses that are tested and the power of the tests.

	Best,
	 John

	> -----Original Message-----
	> From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr
<mailto:emmanuel.curis at parisdescartes.fr> ]
	> Sent: February 23, 2016 11:50 AM
	> To: Fox, John <jfox at mcmaster.ca <mailto:jfox at mcmaster.ca> >
	> Cc: Francesco Romano <francescobryanromano at gmail.com
<mailto:francescobryanromano at gmail.com> >; r-sig-mixed-
	> models at r-project.org <mailto:models at r-project.org>
	> Subject: Re: [R-sig-ME] Replicating type III anova tests for
glmer/GLMM
	>

	> Dear Pr Fox,
	>
	> Thanks for your precision. But to summarize this test of, let's say 3
	> parameters to 0 for a 4-levels factor, by a single value with its SE, as
	> mentionned in Francesco's mail, the linear combination of these
parameters
	> that is practically tested by this sum of square is needed, isn't it ?
	>
	> I mean, if really the parameters are all 0, whatever linear
combination could
	> do the job, but type III sum of square just tests one of all possible
linear
	> combinations, right?
	>
	> By the way, I was always very annoyed by the fact that Type III sum
of
	> squares are so dependent on coding, but that's another debate...
	>
	> Best regards,
	>
	> On Tue, Feb 23, 2016 at 04:15:02PM +0000, Fox, John wrote:
	> < Dear Emmanuel,
	> <
	> < With proper contrast coding (i.e., a coding that's orthogonal in the
*basis*
	> of the design, such as provided by contr.sum() ), a "type-III" test is
just a test
	> that the corresponding parameters are 0. The models in question
are
	> generalized linear (mixed) models and so sums of squares aren't
really
	> involved, but one could do the corresponding Wald (like
car::Anova) or LR
	> test. The Wald test is what you'd get with multcomp:glht or
	> car:linearHypothesis. BTW, I don't think that it would be hard for
car::Anova
	> to be extended to provide LR tests in this case.
	> <
	> < Best,
	> <  John
	>
	> --
	>                                 Emmanuel CURIS
	>                                 emmanuel.curis at parisdescartes.fr
<mailto:emmanuel.curis at parisdescartes.fr>
	>
	> Page WWW: http://emmanuel.curis.online.fr/index.html

--

Frank Romano Ph.D.

Tel. +39 3911639149

LinkedIn
https://it.linkedin.com/pub/francesco-bryan-romano/33/1/162

Academia.edu
https://sheffield.academia.edu/FrancescoRomano
Dear Pr Fox,

Thanks for taking time for this discussion. I think I made a few
shortcuts that are wrong, and I still have some not understood issues
about the kind of tests even in the simplest case of linear models...

First, I think I mixed contrast and quadratic forms expectations in my
answer, I apologize for that; what I had in mind when answering
Francesco was in fact the expectation of the quadratic form, and I too
quickly deduced that there was an equivalent linear combination of the
parameters as its ? square root ?, but this was obviously wrong since
the L matrix in a Lt W L quadratic form does not have to be a column
matrix. Am I wrong thinking that typically in such tests, the L matrix
is precisely a multi-column matrix (hence also several degrees of
freedom associated), and that several contrasts are tested
simultaneously?

I precise that I call ? contrast ? a linear combination of the model
parameters with the constraint that the coefficients of this
combination sum to 0 ? this is the definition in French (? contraste ?),
but I may use it wrongly in English?

Second, I may have wrongly understood the definitions of the various
tests, and especially how they generalize from linear model to
GLM/GLMM...

I thought type I was by taking the squared distance of the successive
orthogonal projections on the subspaces generated by the various
terms, in the order given in the model; type II, by ensuring that
the term tested was the last amongst terms of same order, after
terms of lower order but before terms of higher order; type III, by
projecting on the subspace after removal of the basis vectors for the
term tested ? hence its strong dependency on the coding scheme, and
the ? drop1 ? trick to get them.

Is this definition correct? Does it generalize to other kind models,
or is another definition required? Is it unambiguous? The SAS doc
itself suggests that various procedures call "type II" different kind
of things

However, I cannot see clearly which hypothesis is indeed tested in
each case, especially in terms of cell means or marginal means (and,
when I really need it, I start from them and select the contrasts I
need).  Is there any package/software that allows to print the
hypotheses testeds in terms of means starting from the model formula?
Or is there any good reference that makes the link between the two?
For instance, a demonstration that the comparison of marginal means
? always ? leads to a type XXX sum of square?

Best regards,
? Dear Emmanuel,
? 
? First, the relevant linear hypothesis is for several coefficients simultaneously -- for example, all 3 coefficients for the contrasts representing a 4-level factor -- not for a single contrast. Although it's true that any linear combination of parameters that are 0 is 0, the converse isn't true. Second, for a GLMM, we really should be talking about type-III tests not type-III sums of squares.
? 
? Type-III tests are dependent on coding in the full-rank parametrization of linear (and similar) models used in R, to make the tests correspond to reasonable hypotheses. The invariance of type-II tests with respect to coding is attractive, but shouldn't distract from the fundamental issues, which are the hypotheses that are tested and the power of the tests. 
? 
? Best,
?  John
Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
Dear Emmanuel,

The questions you raise are sufficiently complicated that it's difficult to address them adequately in an email. My Applied Regression and Generalized Linear Models text, for example, takes about 15 pages to explain the relationships among regressor codings, hypotheses, and tests in 2-way ANOVA, working with the full-rank parametrization of the model, and it's possible (as Russell Lenth indicated) to work things out even more generally. 

I'll try to answer briefly, however.
-----Original Message-----
From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr]
Sent: February 24, 2016 4:54 AM
To: Fox, John <jfox at mcmaster.ca>
Cc: Francesco Romano <francescobryanromano at gmail.com>; r-sig-mixed-
models at r-project.org
Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM

Dear Pr Fox,

Thanks for taking time for this discussion. I think I made a few shortcuts that
are wrong, and I still have some not understood issues about the kind of
tests even in the simplest case of linear models...

First, I think I mixed contrast and quadratic forms expectations in my answer,
I apologize for that; what I had in mind when answering Francesco was in fact
No need to apologize. I don't think that these are simple ideas.
the expectation of the quadratic form, and I too quickly deduced that there
was an equivalent linear combination of the parameters as its ? square root
?, but this was obviously wrong since the L matrix in a Lt W L quadratic form
does not have to be a column matrix. Am I wrong thinking that typically in
such tests, the L matrix is precisely a multi-column matrix (hence also several
degrees of freedom associated), and that several contrasts are tested
simultaneously?
Thinking in terms of the full-rank parametrization, as used in R, each type-III hypothesis is that several coefficients are simultaneously 0, which can be simply formulated as a linear hypothesis assuming an appropriate coding of the regressors for a factor. Type-II hypotheses can also be formulated as linear hypotheses, but doing so is more complicated. The Anova() function uses a kind of projection, in effect defining a type-II test as the most powerful test of a conditional hypothesis such as no A main effect given that the A:B interaction is absent in the model y ~ A*B. This works both for linear models, where (unless there is a complication like missing cells), the resulting test corresponds to the test produced by comparing the models y ~ A and y ~ A + B, using Y ~ A*B for the estimate of error variance (i.e., the denominator MS), and more generally for models with linear predictors, where it's in general possible to formulate the (Wald) tests in terms of the coefficient estimates and their covariance matrix.
I precise that I call ? contrast ? a linear combination of the model parameters
with the constraint that the coefficients of this combination sum to 0 ? this is
the definition in French (? contraste ?), but I may use it wrongly in English?
I'd define a "contrast" as the weights associated with the levels of a factor for formulating a hypothesis, where the weights traditionally are constrained to sum to 0, and to differentiate this from a column of the model matrix, which I'd more generally term a "regressor." Often, a traditional set of contrasts for a factor, one less than the number of levels, are defined not only to sum to 0  but also to be orthogonal in the basis of the design. The usage in R is more general, where "contrasts" mean the set of regressors used to represent a factor. Thus, contr.sum() generates regressors that satisfy the traditional definition of contrasts, as do contr.poly() and contr.helmert(), but the default contr.treatment() generates 0/1 dummy-coded regressors that don't satisfy the traditional definition of contrasts.
Second, I may have wrongly understood the definitions of the various tests,
and especially how they generalize from linear model to GLM/GLMM...

I thought type I was by taking the squared distance of the successive
orthogonal projections on the subspaces generated by the various terms, in
the order given in the model; type II, by ensuring that the term tested was
the last amongst terms of same order, after terms of lower order but before
terms of higher order; type III, by projecting on the subspace after removal
of the basis vectors for the term tested ? hence its strong dependency on
the coding scheme, and the ? drop1 ? trick to get them.

Is this definition correct? Does it generalize to other kind models, or is
another definition required? Is it unambiguous? The SAS doc itself suggests
that various procedures call "type II" different kind of things
Yes, if I've followed this correctly, it's correct, and it explains why it's possible to formulate the different types of tests in linear models independently of the contrasts (regressors) used to code the factors -- because fundamentally what's important is the subspace spanned by the regressors in each model, which is independent of coding. This approach, however, doesn't generalize easily beyond linear models fit by least squares. The approach taken in Anova() corresponds to this approach in linear models fit by least squares as long as the models remain full-rank and for type-III tests as long as the contrasts are properly formulated, and generalizes to other models with linear predictors.
However, I cannot see clearly which hypothesis is indeed tested in each case,
especially in terms of cell means or marginal means (and, when I really need
it, I start from them and select the contrasts I need).  Is there any
package/software that allows to print the hypotheses testeds in terms of
means starting from the model formula?
Or is there any good reference that makes the link between the two?
For instance, a demonstration that the comparison of marginal means ?
always ? leads to a type XXX sum of square?
This is where a complete explanation gets too lengthy for an email, but a shorthand formulation, e.g., for the model y ~ A*B, is that type-I tests correspond to the hypotheses A|(B = 0, AB = 0), B | AB = 0, AB = 0; type-II tests to A | AB = 0, B | AB = 0, AB = 0; and type-III tests to A = 0, B = 0, AB = 0. Here, e.g., | AB = 0 means assuming no AB interactions, so, e.g., the hypothesis A | AB = 0 means no A main effects assuming no AB interactions. A hypothesis like A = 0 is indeed formulated in terms of marginal means, understood as cell means for A averaging over the levels of B (not level means of A ignoring B).

I realize that this is far from a complete explanation.

Best,
 John
Best regards,

On Tue, Feb 23, 2016 at 05:17:29PM +0000, Fox, John wrote:
? Dear Emmanuel,
?
? First, the relevant linear hypothesis is for several coefficients
simultaneously -- for example, all 3 coefficients for the contrasts
representing a 4-level factor -- not for a single contrast. Although it's true
that any linear combination of parameters that are 0 is 0, the converse isn't
true. Second, for a GLMM, we really should be talking about type-III tests not
type-III sums of squares.
?
? Type-III tests are dependent on coding in the full-rank parametrization of
linear (and similar) models used in R, to make the tests correspond to
reasonable hypotheses. The invariance of type-II tests with respect to coding
is attractive, but shouldn't distract from the fundamental issues, which are
the hypotheses that are tested and the power of the tests.
?
? Best,
?  John

--
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
Dear Pr Fox,

Thanks for the time taken clarifying things. I'll take time to read
your text, and think over things, but I think that until that I'll
stay with the writing of the comparisons in terms of means and deduce
the linear hypothesis to test, to be sure of what I do.

I don't understand well, in your answer, the part saying ? it explains
why it's possible to formulate the different types of tests in linear
models independently of the contrasts (regressors) used to code the
factors -- because fundamentally what's important is the subspace
spanned by the regressors in each model, which is independent of
coding. ?.

As I understood the model, if we have a 2?2 design (A?B) for instance,
the subspace spanned by all predictors is a 4-dimensionnal space. In
this space, each dimension can be assigned to A, B, their interaction
and a constant. That means, each predictor is associated with a
different basis vector of this 4-dimensionnal space. But there is
several ways of defining the basis, defining different sub-spaces
associated with A, B and A?B, and this corresponds to the different
codings. For instance, I can say (with 4 points)

? A   B A?B       or  ?  A  B  A?B
1 -1 -1  +1           1  0  0  0
1 -1 +1  -1           0  0  1  0
1 +1 -1  -1	      0  1  0  0
1 +1 +1  +1	      0  1  1  1

and the sub-spaces associated with ?, A, B, and A?B are different in
these two codings (but in whole, the 4-dimensionnal space is the
same).  I may miss something trivial, but I would say that the coding
instead defines the subspace spanned by the regressor, and not that
they are independant.

Am I too stuck with coding? But then, how is defined the subspace
associated to a regressor ? absolutly ??
? Dear Emmanuel,
? 
? The questions you raise are sufficiently complicated that it's difficult to address them adequately in an email. My Applied Regression and Generalized Linear Models text, for example, takes about 15 pages to explain the relationships among regressor codings, hypotheses, and tests in 2-way ANOVA, working with the full-rank parametrization of the model, and it's possible (as Russell Lenth indicated) to work things out even more generally. 
? 
? I'll try to answer briefly, however.
? 
? 
? No need to apologize. I don't think that these are simple ideas.
? 
? > the expectation of the quadratic form, and I too quickly deduced that there
? > was an equivalent linear combination of the parameters as its ? square root
? > ?, but this was obviously wrong since the L matrix in a Lt W L quadratic form
? > does not have to be a column matrix. Am I wrong thinking that typically in
? > such tests, the L matrix is precisely a multi-column matrix (hence also several
? > degrees of freedom associated), and that several contrasts are tested
? > simultaneously?
? 
? Thinking in terms of the full-rank parametrization, as used in R, each type-III hypothesis is that several coefficients are simultaneously 0, which can be simply formulated as a linear hypothesis assuming an appropriate coding of the regressors for a factor. Type-II hypotheses can also be formulated as linear hypotheses, but doing so is more complicated. The Anova() function uses a kind of projection, in effect defining a type-II test as the most powerful test of a conditional hypothesis such as no A main effect given that the A:B interaction is absent in the model y ~ A*B. This works both for linear models, where (unless there is a complication like missing cells), the resulting test corresponds to the test produced by comparing the models y ~ A and y ~ A + B, using Y ~ A*B for the estimate of error variance (i.e., the denominator MS), and more generally for models with linear predictors, where it's in general possible to formulate the (Wald) tests in terms of the coefficient estimates and their covariance matrix.
? 
? > 
? > I precise that I call ? contrast ? a linear combination of the model parameters
? > with the constraint that the coefficients of this combination sum to 0 ? this is
? > the definition in French (? contraste ?), but I may use it wrongly in English?
? 
? I'd define a "contrast" as the weights associated with the levels of a factor for formulating a hypothesis, where the weights traditionally are constrained to sum to 0, and to differentiate this from a column of the model matrix, which I'd more generally term a "regressor." Often, a traditional set of contrasts for a factor, one less than the number of levels, are defined not only to sum to 0  but also to be orthogonal in the basis of the design. The usage in R is more general, where "contrasts" mean the set of regressors used to represent a factor. Thus, contr.sum() generates regressors that satisfy the traditional definition of contrasts, as do contr.poly() and contr.helmert(), but the default contr.treatment() generates 0/1 dummy-coded regressors that don't satisfy the traditional definition of contrasts.
? 
? > 
? > Second, I may have wrongly understood the definitions of the various tests,
? > and especially how they generalize from linear model to GLM/GLMM...
? > 
? > I thought type I was by taking the squared distance of the successive
? > orthogonal projections on the subspaces generated by the various terms, in
? > the order given in the model; type II, by ensuring that the term tested was
? > the last amongst terms of same order, after terms of lower order but before
? > terms of higher order; type III, by projecting on the subspace after removal
? > of the basis vectors for the term tested ? hence its strong dependency on
? > the coding scheme, and the ? drop1 ? trick to get them.
? > 
? > Is this definition correct? Does it generalize to other kind models, or is
? > another definition required? Is it unambiguous? The SAS doc itself suggests
? > that various procedures call "type II" different kind of things
? 
? Yes, if I've followed this correctly, it's correct, and it explains why it's possible to formulate the different types of tests in linear models independently of the contrasts (regressors) used to code the factors -- because fundamentally what's important is the subspace spanned by the regressors in each model, which is independent of coding. This approach, however, doesn't generalize easily beyond linear models fit by least squares. The approach taken in Anova() corresponds to this approach in linear models fit by least squares as long as the models remain full-rank and for type-III tests as long as the contrasts are properly formulated, and generalizes to other models with linear predictors.
? 
? > 
? > However, I cannot see clearly which hypothesis is indeed tested in each case,
? > especially in terms of cell means or marginal means (and, when I really need
? > it, I start from them and select the contrasts I need).  Is there any
? > package/software that allows to print the hypotheses testeds in terms of
? > means starting from the model formula?
? > Or is there any good reference that makes the link between the two?
? > For instance, a demonstration that the comparison of marginal means ?
? > always ? leads to a type XXX sum of square?
? 
? This is where a complete explanation gets too lengthy for an email, but a shorthand formulation, e.g., for the model y ~ A*B, is that type-I tests correspond to the hypotheses A|(B = 0, AB = 0), B | AB = 0, AB = 0; type-II tests to A | AB = 0, B | AB = 0, AB = 0; and type-III tests to A = 0, B = 0, AB = 0. Here, e.g., | AB = 0 means assuming no AB interactions, so, e.g., the hypothesis A | AB = 0 means no A main effects assuming no AB interactions. A hypothesis like A = 0 is indeed formulated in terms of marginal means, understood as cell means for A averaging over the levels of B (not level means of A ignoring B).
? 
? I realize that this is far from a complete explanation.
? 
? Best,
?  John
Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
Dear Emmanuel,

Again, I'll respond briefly, and not in the detail that your questions really require:
-----Original Message-----
From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr]
Sent: February 26, 2016 11:52 AM
To: Fox, John <jfox at mcmaster.ca>
Cc: Francesco Romano <francescobryanromano at gmail.com>; r-sig-mixed-
models at r-project.org
Subject: Re: [R-sig-ME] Replicating type III anova tests for glmer/GLMM

Dear Pr Fox,

Thanks for the time taken clarifying things. I'll take time to read your text,
and think over things, but I think that until that I'll stay with the writing of the
comparisons in terms of means and deduce the linear hypothesis to test, to
be sure of what I do.

I don't understand well, in your answer, the part saying ? it explains why it's
possible to formulate the different types of tests in linear models
independently of the contrasts (regressors) used to code the factors --
because fundamentally what's important is the subspace spanned by the
regressors in each model, which is independent of coding. ?.
The subspace spanned by the regressors in a model like y ~ A*B or y ~ A + B is independent of the coding of the regressors.
As I understood the model, if we have a 2?2 design (A?B) for instance, the
subspace spanned by all predictors is a 4-dimensionnal space. In this space,
each dimension can be assigned to A, B, their interaction and a constant. That
means, each predictor is associated with a different basis vector of this 4-
dimensionnal space. But there is several ways of defining the basis, defining
different sub-spaces associated with A, B and A?B, and this corresponds to
the different codings. For instance, I can say (with 4 points)

? A   B A?B       or  ?  A  B  A?B
1 -1 -1  +1           1  0  0  0
1 -1 +1  -1           0  0  1  0
1 +1 -1  -1	      0  1  0  0
1 +1 +1  +1	      0  1  1  1

In both cases, models like y ~ A*B and y ~ A + B produce the same y-hat vectors and hence same SSs. The situation is a bit more complicated for models that violate marginality, but that situation can be handled by more general approaches, like estimable functions or close attention to the hypotheses tested. All tests can be formulated linear hypothesis in the parameters of the full, full-rank model, but different parametrizations make the tests simpler or more difficult.

You've shown the row-basis for the model matrix in the cases of effect ("contr.sum") coding and dummy ("contr.treatment") coding. Call the basis matrix X_B. Then, because these are full-rank parametrizations, as long as no cells are empty, you can solve for the cell means in terms of the model parameters. Call the parameter vector corresponding to the basis beta_B and the ravelled vector of cell means mu. Then mu = X_B beta_B and (because X_B is nonsingular), beta_B = X_B^-1 mu. This allows you to see the composition of each parameter in terms of cell means and thus the hypothesis tested by the (type-III) test that the parameter is 0. In the case of effect coding, the columns of X_B are orthogonal and so its inverse is particularly simple, with each row equal to a column of X_B up to a constant factor.
and the sub-spaces associated with ?, A, B, and A?B are different in these
two codings (but in whole, the 4-dimensionnal space is the same).  I may miss
something trivial, but I would say that the coding instead defines the
subspace spanned by the regressor, and not that they are independant.

Am I too stuck with coding? But then, how is defined the subspace
associated to a regressor ? absolutly ??
It's not, but because the model matrix spans the same subspace, it's possible to test the same hypotheses in full-rank formulations of the same model. One way to see that is to work backwards from beta_B = X_B^-1 mu (that is, define X_B^-1 as the contrasts that you want to test) to mu = X_B beta_B. As mentioned, this is particularly simple when the *rows* of X_B^-1 are orthogonal contrasts.

John

On Wed, Feb 24, 2016 at 04:08:42PM +0000, Fox, John wrote:
? Dear Emmanuel,
?
? The questions you raise are sufficiently complicated that it's difficult to
address them adequately in an email. My Applied Regression and
Generalized Linear Models text, for example, takes about 15 pages to explain
the relationships among regressor codings, hypotheses, and tests in 2-way
ANOVA, working with the full-rank parametrization of the model, and it's
possible (as Russell Lenth indicated) to work things out even more generally.
?
? I'll try to answer briefly, however.
?
?
? No need to apologize. I don't think that these are simple ideas.
?
? > the expectation of the quadratic form, and I too quickly deduced that
there ? > was an equivalent linear combination of the parameters as its ?
square root ? > ?, but this was obviously wrong since the L matrix in a Lt W L
quadratic form ? > does not have to be a column matrix. Am I wrong thinking
that typically in ? > such tests, the L matrix is precisely a multi-column matrix
(hence also several ? > degrees of freedom associated), and that several
contrasts are tested ? > simultaneously?
?
? Thinking in terms of the full-rank parametrization, as used in R, each type-
III hypothesis is that several coefficients are simultaneously 0, which can be
simply formulated as a linear hypothesis assuming an appropriate coding of
the regressors for a factor. Type-II hypotheses can also be formulated as
linear hypotheses, but doing so is more complicated. The Anova() function
uses a kind of projection, in effect defining a type-II test as the most
powerful test of a conditional hypothesis such as no A main effect given that
the A:B interaction is absent in the model y ~ A*B. This works both for linear
models, where (unless there is a complication like missing cells), the resulting
test corresponds to the test produced by comparing the models y ~ A and y ~
A + B, using Y ~ A*B for the estimate of error variance (i.e., the denominator
MS), and more generally for models with linear predictors, where it's in
general possible to formulate the (Wald) tests in terms of the coefficient
estimates and their covariance matrix.
?
? >
? > I precise that I call ? contrast ? a linear combination of the model
parameters ? > with the constraint that the coefficients of this combination
sum to 0 ? this is ? > the definition in French (? contraste ?), but I may use it
wrongly in English?
?
? I'd define a "contrast" as the weights associated with the levels of a factor
for formulating a hypothesis, where the weights traditionally are constrained
to sum to 0, and to differentiate this from a column of the model matrix,
which I'd more generally term a "regressor." Often, a traditional set of
contrasts for a factor, one less than the number of levels, are defined not
only to sum to 0  but also to be orthogonal in the basis of the design. The
usage in R is more general, where "contrasts" mean the set of regressors
used to represent a factor. Thus, contr.sum() generates regressors that
satisfy the traditional definition of contrasts, as do contr.poly() and
contr.helmert(), but the default contr.treatment() generates 0/1 dummy-
coded regressors that don't satisfy the traditional definition of contrasts.
?
? >
? > Second, I may have wrongly understood the definitions of the various
tests, ? > and especially how they generalize from linear model to
GLM/GLMM...
? >
? > I thought type I was by taking the squared distance of the successive ? >
orthogonal projections on the subspaces generated by the various terms, in
? > the order given in the model; type II, by ensuring that the term tested
was ? > the last amongst terms of same order, after terms of lower order but
before ? > terms of higher order; type III, by projecting on the subspace
after removal ? > of the basis vectors for the term tested ? hence its strong
dependency on ? > the coding scheme, and the ? drop1 ? trick to get them.
? >
? > Is this definition correct? Does it generalize to other kind models, or is ? >
another definition required? Is it unambiguous? The SAS doc itself suggests ?
that various procedures call "type II" different kind of things ? ? Yes, if I've
followed this correctly, it's correct, and it explains why it's possible to
formulate the different types of tests in linear models independently of the
contrasts (regressors) used to code the factors -- because fundamentally
what's important is the subspace spanned by the regressors in each model,
which is independent of coding. This approach, however, doesn't generalize
easily beyond linear models fit by least squares. The approach taken in
Anova() corresponds to this approach in linear models fit by least squares as
long as the models remain full-rank and for type-III tests as long as the
contrasts are properly formulated, and generalizes to other models with
linear predictors.
?
? >
? > However, I cannot see clearly which hypothesis is indeed tested in each
case, ? > especially in terms of cell means or marginal means (and, when I
really need ? > it, I start from them and select the contrasts I need).  Is there
any ? > package/software that allows to print the hypotheses testeds in
terms of ? > means starting from the model formula?
? > Or is there any good reference that makes the link between the two?
? > For instance, a demonstration that the comparison of marginal means ? ?
always ? leads to a type XXX sum of square?
?
? This is where a complete explanation gets too lengthy for an email, but a
shorthand formulation, e.g., for the model y ~ A*B, is that type-I tests
correspond to the hypotheses A|(B = 0, AB = 0), B | AB = 0, AB = 0; type-II
tests to A | AB = 0, B | AB = 0, AB = 0; and type-III tests to A = 0, B = 0, AB = 0.
Here, e.g., | AB = 0 means assuming no AB interactions, so, e.g., the
hypothesis A | AB = 0 means no A main effects assuming no AB interactions.
A hypothesis like A = 0 is indeed formulated in terms of marginal means,
understood as cell means for A averaging over the levels of B (not level
means of A ignoring B).
?
? I realize that this is far from a complete explanation.
?
? Best,
?  John

--
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html