understanding log-likelihood/model fit

I prefer smaller data sets so I can
more readily look at all components;
here is my modification of the modeling
(along with a random seed so we
all get the same data)

You can change the parameter values
for big data sets.

Nsubj <- 10
Ngrp <- 2
NsubjRep <- 5

set.seed(123)

test1s <- data.frame(subject = rep(seq(Nsubj * Ngrp), each = NsubjRep),
        response=500+c(rep(-100,Nsubj * NsubjRep),rep(100,Nsubj * NsubjRep))+rnorm(Nsubj * Ngrp * NsubjRep, 0, 10),
        fixed=(rep(c("A","B"),each=Nsubj * NsubjRep)))

null1 <- lmer(response~(1|subject),test1s)
fixed1 <- lmer(response~fixed+(1|subject),test1s)
summary(null1)
summary(fixed1)

quartz() ## substitute your favourite plot device 
par(mfrow = c(2, 1))
hist(ranef(null1)[[1]][,1])
hist(ranef(fixed1)[[1]][,1])

quartz() ## substitute your favourite plot device 
par(mfrow = c(2, 1))
qqnorm(ranef(null1)[[1]][,1])
qqline(ranef(null1)[[1]][,1])
qqnorm(ranef(fixed1)[[1]][,1])
qqline(ranef(fixed1)[[1]][,1])

### -----

Look at the histogram of BLUPs for model null1.
The strong bimodal structure is a clue that
there may be an underlying fixed effect
(and in fact we know that this is the case,
the beauty of simulations).

The QQ plot of the null1 model also
hints that subjects are not just some
selection from a normal distribution.

The fixed effect (see summary(fixed1)
output) shows a huge t value and the
random effect estimates are near zero
(you'll see this message during the
fitting process too)

Warning message:
In .local(x, ..., value) :
  Estimated variance for factor ?subject? is effectively zero

All of this suggests that a fixed effect (covariate 'fixed')
is appropriate to explain the structure
in the data, and subject need not be modeled
as a random effect.  A linear model without
random effects would suffice here.

Now to run John Maindonald's model
with subjects being drawn from a
normal distribution:

### -----

set.seed(123)
test2s <- data.frame(subject = rep(seq(Nsubj * Ngrp), each = NsubjRep),
        response=500 + rep(rnorm(Nsubj * Ngrp), each = NsubjRep) + rnorm(Nsubj * Ngrp * NsubjRep, 0, 10),
        fixed=(rep(c("A","B"),each=Nsubj * NsubjRep)))

null2 <- lmer(response~(1|subject),test2s)
fixed2 <- lmer(response~fixed+(1|subject),test2s)
summary(null2)
summary(fixed2)

quartz() ## substitute your favourite plot device 
par(mfrow = c(2, 1))
hist(ranef(null2)[[1]][,1])
hist(ranef(fixed2)[[1]][,1])

quartz() ## substitute your favourite plot device 
par(mfrow = c(2, 1))
qqnorm(ranef(null2)[[1]][,1])
qqline(ranef(null2)[[1]][,1])
qqnorm(ranef(fixed2)[[1]][,1])
qqline(ranef(fixed2)[[1]][,1])

### -----

Now the histogram of null2 BLUPs looks
like a unimodal density, and the
QQ plot suggests BLUPs are closer
to normal.  This is much more in line
with the situation that random effects 
components were designed to handle.

The fixed effect in fixed2 (see
summary(fixed2) output) does not
contribute to the model fit, as indicated
by the small t-value, and the random effect 
estimates are not all small numbers near 
zero.  A random effects model with subject
as a random effect is appropriate
here.

So this is a partial look into the
difference between fixed and random
effects.  

A covariate that is more 'degenerate', 
i.e. takes on a few distinct values, so 
has a 'spiky' distribution, is often better
modeled as a fixed effect.

When a covariate really can be imagined
to be a draw from some large non-degenerate
or non-spiky distribution, and you are
not really interested in the individual
values it can assume, but you want to
'model out' its effect (so its effect doesn't
cause some other covariate estimate to be
dragged off course) you can save
model degrees of freedom by modeling
that covariate as a random effect. 

Comparing likelihoods and other
statistics derived therefrom will
guide you in the right direction,
as these simulations illustrate.
The likelihoods do 'depend' on
the structure in the data.  
Statistics work!

HTH

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org on behalf of John Maindonald
Sent: Tue 8/19/2008 5:25 PM
To: Daniel Ezra Johnson
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] understanding log-likelihood/model fit

understanding log-likelihood/model fit

Thread (13 messages)