over-dispersed Poisson with lmer -- a trick, with a catch? - R-SIG-mixed-models

Thu, Jul 16, 2009 9:01 AM #

Hi all--

I'm currently working on analyses of longitudinal count data (number of 
drinking related problems over 2 years in a randomized trial for alcohol 
interventions).

The data are pretty clearly over-dispersed relative to Poisson, which is 
  clear when I fit the model using MCMCglmm, which includes a residual 
error term to account for over-dispersion (thank you, Jarrod!).

However, I'm collaborating with a colleague who uses Stata.  He 
mentioned that Sophia Rabe-Hesketh in her mixed-models book discusses 
fitting over-dispersed Poisson mixed-effects models using the following 
"trick":

Create a new variable with a unique value for each observation and 
include this as an additional random-effect term.

Basically, this will soak up any residual dispersion, over and above 
that accounted for in Poisson model and other random-effects.  On the 
face of it, it makes sense to me.  Though, we hit a snag in trying to do 
this in lmer().  Using the data that I have been analyzing:

### create new var with unique value for each obs/row
 > snap.df$over <- 1:nrow(snap.df)

### include as separate random-effect in model
 > rapi.glmer2.1 <- glmer(rapisum ~ asex*time + (time|ID) + (1|over),
+ 					data = snap.df, verbose = TRUE,
+ 					family = poisson)
Error in mer_finalize(ans) : q = 5252 > n = 3616

So, we already have a random intercept and slope for time [ie, 
(time|ID)].  With 818 participants the random-effects specify 818 
intercepts plus 818 slopes, and now with an observation level 
random-effect [ie, (1|over)] with 3616 observations... we get 5252 
estimates.  lmer() doesn't like it.

I know there's been a bit of chatter about this error before, and I 
believe Doug mentioned that it was to avoid people radically 
over-parameterizing/over-fitting their data via random-effects.  Though, 
I believe there is some room for debate (ie, the model above does not 
fit 5252 unique parameters).

Anyhow, I would be curious for any input on:

1. Does this seem like a sensible approach to account for 
over-dispersion?  (caveat: I don't have Rabe-Hesketh's book, so just 
going on recounting from colleague)

2. If so, I wonder if we might be able to coax Doug to change the error 
to an argument in the model (that defaults to the error msg but could be 
over-ridden)?

cheers, Dave

Dave Atkins, PhD
Research Associate Professor
Center for the Study of Health and Risk Behaviors
Department of  Psychiatry and Behavioral Science
University of Washington
1100 NE 45th Street, Suite 300
Seattle, WA  98105
206-616-3879
datkins at u.washington.edu

Ben Bolker

Thu, Jul 16, 2009 9:09 AM #

David Atkins wrote:

[snip]

I think so.  It's done in other places (for example, it's exactly
equivalent to what MCMCglmm does), and in Elston et al 2001 (they use a
PQL estimation procedure in Genstat).  (Although "everybody does it"
doesn't *mean* it's correct ...)

Good luck with that ...

  If you can conveniently build from source, you can just comment out
line 1068 (search for "> n =") in lmer.c ...

  cheers
    Ben


Elston, D.A., Moss, R., Boulinier, T., Arrowsmith, C. & Lambin, X.
(2001) Analysis of aggregation, a worked example: numbers of ticks on
red grouse chicks. Parasitology, 122, 563-569.

Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc

Ben Bolker

Thu, Jul 16, 2009 9:11 AM #

PS -- if MCMCglmm works, why not stop there?  Do you absolutely need
it to work in lme4?

David Atkins

Thu, Jul 16, 2009 9:21 AM #

Ben Bolker wrote:

Ben--

Thanks for the comments (and, yes, I am very hesitant to ever request 
anything from Doug given the Herculean amount of work he has put into 
developing lmer()... suppose I should take this opportunity to learn how 
to build from sources).

As for the comment above, on the one hand, you're right.  We've got a 
solution that works.  Moreover, a lot of our addictions data is 
zero-inflated, and MCMCglmm() can handle that as well (again, thank you 
Jarrod!).

However, I've been using lme/lmer for 12 years and am really comfortable 
with it.  In addition, it's sssssmoking fast, and some of our problems 
get large.  An alternative model with daily drinking reports for past 90 
days on several hundred participants took a couple hours to fit in 
MCMCglmm().  Now, it's great that we can fit the model, but speed does 
count for something.

[Actually, as an aside, my colleague who uses Stata called me while the 
over-dispersed model was running in Stata, "Eh, Dave, this is going to 
take a while in Stata, any chance you could fit this in R?" ;)]

cheers, Dave

Dave Atkins, PhD
Research Associate Professor
Center for the Study of Health and Risk Behaviors
Department of  Psychiatry and Behavioral Science
University of Washington
1100 NE 45th Street, Suite 300
Seattle, WA  98105
206-616-3879
datkins at u.washington.edu

dave fournier

Thu, Jul 16, 2009 10:07 AM #

Hi,

Instead of "tricks", why don't you just fit a negative binomial mixed
model. That should deal with the over-dispersion. You can do this with
AD Model Builder's random effects package which is now freely available
at http://admb-project.org.

    Dave

David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com