Refitting model with ML -- why search for optimal theta again?

Mon, Mar 19, 2018 1:52 PM #

Dear list,

I apologize in advance if this question is stupid; it's just that there's something that I do not understand and I would like to learn more. I know that for linear mixed models, optimizing via ML provides slightly anticonservative parameter estimates; as I understand it, this is because ML estimates ? without taking into account uncertainty in the estimates for ? (since the two are estimated simultaneously). This uncertainty can be taken into account by optimizing the REML criterion instead, which integrates out the unknown ?, and hence finds estimates for ? that should be appropriate for *any* value that ? could have had.

If this is correct, the REML criterion contains an added constant that varies depending on the provided fixed-effects structure, and hence REML fits with different fixed effects cannot be compared. When asking for the anova() of two REML-fitted lme4 objects, the package thus refits the models using ML:

Loading required package: Matrix

refitting model(s) with ML (instead of REML)
Data: sleepstudy
Models:
small: Reaction ~ (1 | Subject)
big: Reaction ~ Days + (1 | Subject)
      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
small  3 1916.5 1926.1 -955.27   1910.5                             
big    4 1802.1 1814.8 -897.04   1794.1 116.46      1  < 2.2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


This is what I do not understand: why do we need to re-optimize the (now ML) objective function, when we already have the optimal values in big at theta and small at theta? Starting from the REML criterion, we would only need to take out the integrated fixed effects, and then we end up with an ML likelihood, correct? But then, why can't I simply do:

[1] 116.4677

i.e. take the already-found optimal ? values and plug them into the ML formula? Why do we need to search for the optimal ? values again -- even if the values found by ML differ from those found by REML, shouldn't the REML values be preferred?

Thank you very much for any insights,
Cesko

Jake Westfall

Mon, Mar 19, 2018 2:03 PM #

Hi Cesko,

why can't I simply [...] take the already-found optimal ? values and plug

Optimizing the likelihood vs. the REML criterion leads to different ?
estimates. The statistical theory underlying the likelihood ratio test only
holds for ML estimates of ?, not REML estimates. You can compute the
likelihood value for the REML estimates, as you do in your code example,
but this doesn't change the fact that they are REML estimates and not ML
estimates.

Jake

On Mon, Mar 19, 2018 at 3:52 PM, Voeten, C.C. <c.c.voeten at hum.leidenuniv.nl>
wrote:

Dear list,

I apologize in advance if this question is stupid; it's just that there's
something that I do not understand and I would like to learn more. I know
that for linear mixed models, optimizing via ML provides slightly
anticonservative parameter estimates; as I understand it, this is because
ML estimates ? without taking into account uncertainty in the estimates for
? (since the two are estimated simultaneously). This uncertainty can be
taken into account by optimizing the REML criterion instead, which
integrates out the unknown ?, and hence finds estimates for ? that should
be appropriate for *any* value that ? could have had.

If this is correct, the REML criterion contains an added constant that
varies depending on the provided fixed-effects structure, and hence REML
fits with different fixed effects cannot be compared. When asking for the
anova() of two REML-fitted lme4 objects, the package thus refits the models
using ML:

library(lme4)

Loading required package: Matrix

big <- lmer(Reaction ~ Days + (1|Subject),sleepstudy)
small <- lmer(Reaction ~ (1|Subject),sleepstudy)
anova(small,big) # refits model

refitting model(s) with ML (instead of REML)
Data: sleepstudy
Models:
small: Reaction ~ (1 | Subject)
big: Reaction ~ Days + (1 | Subject)
      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
small  3 1916.5 1926.1 -955.27   1910.5
big    4 1802.1 1814.8 -897.04   1794.1 116.46      1  < 2.2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1


This is what I do not understand: why do we need to re-optimize the (now
ML) objective function, when we already have the optimal values in big at theta
and small at theta? Starting from the REML criterion, we would only need to
take out the integrated fixed effects, and then we end up with an ML
likelihood, correct? But then, why can't I simply do:

big.LLML <- update(big,REML=F,devFunOnly=T)(big at theta) #compute ML

deviance based on optimal theta

small.LLML <- update(small,REML=F,devFunOnly=T)(small at theta)
small.LLML - big.LLML # same difference, without needing to re-fit!

[1] 116.4677

i.e. take the already-found optimal ? values and plug them into the ML
formula? Why do we need to search for the optimal ? values again -- even if
the values found by ML differ from those found by REML, shouldn't the REML
values be preferred?

Thank you very much for any insights,
Cesko

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models