Overdispersion in Bernoulli models - R-SIG-mixed-models

Tue, Sep 28, 2010 5:46 AM #
Hi again.
I've been working with a Bernoulli dependent variable and modeling its
values as a mixed model using glmer().  My understanding is that it's
not possible to model overdispersion in a Bernoulli dataset (for
instance, see Gelman and Hill 2007, p302, or
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/91242.html ), however
I can find some suggestions that while it's technically not possible
to do this, estimating some analog of this parameter may nonetheless
improve model fit (see Skrondal and Rabe-Hesketh 2007, and a paragraph
in Venables and Ripley 2002, p297-8 (table 10.4, the "bacteria"
data)), even if that parameter doesn't actually represent
"overdispersion" per se.  I don't really understand how this is
possible for Bernoulli datasets -- I think the mechanics of this
process are well beyond me -- so I'd have to take it on faith that
it's working properly, and that makes me uncomfortable.

What I find interesting is that in the following self-contained
example, glmer() happily models with a quasibinomial family and gives
tighter SEs about the fixed effects compared to binomial(), but
glmmPQL (and incidentally, a glm() without the arbitrary grouping
factor) doesn't (or maybe it does? -- it has broader SEs, but also a
residual SD term).  In general, my suspicion is that something's amiss
-- I'm somehow "abusing" the glmer -- and I'm better just sticking
with the binomial family for Bernoulli datasets.  Any thoughts?

Thanks very much.
--Bob Farmer
Dalhousie University, Halifax NS

rm(list=ls())
table(bacteria$y) #Bernoulli dependent
m1<-glm(y ~ trt * week, data = bacteria, family=binomial)
#above example from ?bacteria
m2<-update(m1, family=quasibinomial)
summary(m1)
summary(m2)  #standard errors are similar enough

#use ID as a grouping factor
library(lme4)
m1.mix<-glmer(y ~ trt * week + (1 | ID), data = bacteria,family=binomial)
m2.mix<-update(m1.mix, family=quasibinomial)
summary(m1.mix)
summary(m2.mix)  #vastly different standard errors, plus residual
sigma (the overdisp?)

library(MASS)
m1.mix.PQL<-glmmPQL(y ~ trt * week, random = ~ 1 | ID, data = bacteria,
  family=binomial)
m2.mix.PQL<-update(m1.mix.PQL, family=quasibinomial)
summary(m1.mix.PQL) #also with residual sigma, but m1-style SEs
summary(m2.mix.PQL)

Note:  As far as I can tell, in these non-BUGS-based models, this is
multiplicative overdispersion (which estimates an overdispersion
parameter based on how the model deviance differs from its expected
value (which is distributed as a chi-square when overdispersion == 1)
and then scales down the standard deviation of fixed-effects
parameters accordingly.  My question does not concern the alternative
of adding a sigma[i] parameter to the main model formula directly.