understanding log-likelihood/model fit
I'll take a swing at these.
On Wed, Aug 20, 2008 at 02:01:29PM +0100, Daniel Ezra Johnson wrote:
Everyone agrees about what happens here:
Nsubj <- 10
Ngrp <- 2
NsubjRep <- 5
set.seed(123)
test1s <- data.frame(subject = rep(seq(Nsubj * Ngrp), each = NsubjRep),
response=500+c(rep(-100,Nsubj * NsubjRep),rep(100,Nsubj *
NsubjRep))+rnorm(Nsubj * Ngrp * NsubjRep, 0, 10),
fixed=(rep(c("A","B"),each=Nsubj * NsubjRep)))
null1 <- lmer(response~(1|subject),test1s)
fixed1 <- lmer(response~fixed+(1|subject),test1s)
I still have two questions which I'll try to restate. I should note
that I have attempted to understand the mathematical details of ML
mixed effect model fitting and it's a bit beyond me. But I hope that
someone can provide an answer I can understand.
Question 1: When you have an "outer" fixed effect and a "subject"
random effect in the same model, specifically why does the model
(apparently) converge in such a way that the fixed effect is maximized
and the random effect minimized? (Not so much why should it, as why
does it? This is the 'fixed1' case.)
A common way for hierarchical models to be fit using ML is by profiling out the fixed effects, estimating the random effects, and then using GLS to estimate the fixed effects conditional on the random effects. So, any explanatory capacity that the fixed effects offer is deployed before the random effects are invoked. Likewise a popular way to applying ReML is to fit the fixed effects using OLS, then estimate the random effects from the residuals. Again, the net effect is that any explanatory capacity that the fixed effects offer is deployed before the random effects are invoked.
Question 2: Take the fixed1 model from Question 1 and compare it to the null1 model, which has a random subject effect but no fixed effect. The predicted values of the two models -- the ones from fitted(), which include the ranefs -- are virtually the same. So why does fixed1 have a lower deviance, why is it preferred to null1 in a likelihood ratio test? (Again, I'm not asking why it's a better model. I'm asking questions about the software, estimation procedure, and/or theory of likelihood applied to such cases.)
The deviance is computed from the log likelihood of the data, conditional on the model. The LL of the null model is maximized by making the variance components big enough to cover the variation of the data. But, this means that the likelihood is being spread thinly, as it were. Eg ...
dnorm(0, sd=1)
[1] 0.3989423
dnorm(0, sd=2)
[1] 0.1994711 On the other hand, fixed1 uses fixed effects to explain a lot of that variation, so that when the time comes to estimate the random effects, they are smaller, and the LL is higher, because it doesn't have to stretch so far. Cheers, Andrew
Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/