ML or REML for LR tests
On Fri, Aug 29, 2008 at 2:53 AM, Ken Beath <ken at kjbeath.com.au> wrote:
On 29/08/2008, at 2:47 PM, Austin Frank wrote:
On Thu, Aug 28 2008, Doran, Harold wrote:
The likelihood-ratio test approach directly compares these two.
Since these models differ in their fixed effects, you need REML=FALSE for the LRT to be meaningful.
This is a standard operating procedure that I picked up and accepted on faith when I first started using lmer, before I really knew what I was doing. It occurs to me that this is the case for much of my understanding of model comparison, so I'd like to check my understanding of the use of LR tests with lmer. If this is a case of RTFM, please provide a pointer to the relevant Friendly Manual ;) 1) Can anyone offer a reference where the case is made for doing LR tests on models fit by ML (as opposed to REML)?
Any decent mixed models text. Verbeke and Molenberghs "Linear Mixed Models for Longitudinal Data" p63 or Pinheiro and Bates "Mixed-Effects Models in S and S-Plus" p76.
2) Can non-nested ML models with the same number of fixed effects be meaningfully compared with an LR test? Something like:
No. General principle is for LR test models must be nested.
--8<---------------cut here---------------start------------->8---
data(sleepstudy)
set.seed(535353)
sleepstudy$Fake <- rnorm(nrow(sleepstudy))
m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy, REML=FALSE)
m2 <- lmer(Reaction ~ Fake + (1 | Subject), sleepstudy, REML=FALSE)
anova(m1, m2) # Is this test meaningful...
## When possible, test against superset model
m12 <- lmer(Reaction ~ Days + Fake (1 | Subject),
sleepstudy, REML=FALSE)
anova(m1, m2, m12) # ... or only this one?
--8<---------------cut here---------------end--------------->8---
3) Is it the case that LR tests between REML models with different
random effects are meaningful? Does this apply to both nested and
non-nested models?
Maybe, but only for nested (see Q2). Supposedly it works better than ML. The significance tests wont be correct but if there is a huge significance level then there is probably a random effect. Simulation seems a better idea.
It may help to give a bit of background about what happens in the
anova method for mer objects in the lme4 package.
When fitting a linear mixed model, both the profiled deviance
(negative twice the log-likelihood) and the profiled REML criterion
are evaluated at each iteration during the optimization. The word
"profiled" means that, although the likelihood or the REML criterion
are defined as functions of the fixed-effects parameters,
$\boldmath{\beta}$, the common scale parameter, $\sigma$, and the
parameters $\boldmath{\theta}$ that determine the relative covariance
matrix $\boldmath{ \Sigma}$ of the random effects, these criteria are
evaluated as a function of $\boldmath{\theta}$ alone, with
$\boldmath{\beta}$ and $\sigma$ at their conditionally optimal values.
The profiled REML criterion has a term that depends on the model
matrix X for the fixed-effects parameters. If you change the
definition of the fixed-effects you will change the value of that
criterion in a systematic way that does not depend on how well the
respective models fit the observed data. Thus, differences in the
REML criterion are not a meaningful way to compare two models that
differ in the definition of the fixed-effects.
If you look at the code for the anova method for the mer class you
will see that it always calls logLik with REML = FALSE. That is, it
extracts the log-likelihood from the profiled deviance even if the
model has been fit according to the REML criterion. In general that
would be a very bad idea if we were using the deviance and not the
profiled deviance. You can't count on the deviance at the REML
estimates for the complete parameter vector being close to the optimal
deviance. However, the profiled deviance at the REML estimates of
$\boldmath{\theta}$ is close to the optimal profiled deviance.
For example,
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ## profiled deviance at the REML estimate of $\theta$ dput(d1 <- deviance(fm1, REML = FALSE))
structure(1751.98581110077, .Names = "ML")
## optimal value of the deviance dput(d2 <- deviance(update(fm1, REML = FALSE)))
structure(1751.93934480597, .Names = "ML")
## amount by which the profiled deviance at REML estimate ## exceeds the optimal value d1 - d2
ML 0.04646629 As you can see, I have gravitated to using the term "REML criterion" rather than calling it a deviance or a log-likelihood. The print method for mer objects still labels it as the REMLdeviance but I plan to change that and create an additional extractor called REML to return that value. In that case the deviance extractor will always return the deviance, perhaps after re-optimizing to produce the optimal deviance. So the bottom line is that the anova method for mer objects always produces a likelihood ratio test based on the likelihood. If you compare models fit by REML that likelihood ratio statistic will be somewhat inaccurate but only by a small amount (in all the cases that I have seen).