I have a question regarding the appropriateness of diagnosing
overdispersion in mixed models using parametric bootstrap as
provided in the lme4 package.
Following Zuur et al (2009), Bolker et al (2009), and nicely
summarised on the glmm wiki, one can approximate overdispersion in a
model as the ratio of sum of squared residuals to residual degrees
of freedom. However this suffers the problem of not knowing exactly
how many residual degrees of freedom there are in a mixed model.
Yes, although keeping in mind that the whole thing is an approximation
anyway, getting the residual df wrong by a few percent is hopefully
not that important
My question is, given the flexibility of the parametric
bootstrapping function in lme4, could one not calculate
overdispersion by looking at the ratio of the model SS residuals to
the mean SS residuals from n parametric bootstrap samples? Does it
follow that if the data are well approximated by a Poisson process,
this ratio should be roughly 1, whereas if there is extra
variability in the data than a Poisson can predict', the ratio
should be a fair bit higher?
This seems quite reasonable.
Below I have provided code to reproduce an example along the lines
of what I have just tried to describe. I have simulated two sets of
data where a Poisson outcome (number of offspring) is affected by
body size (with positive slope). In both cases the data are
generated from Poisson process, but in the latter case I add some
random noise to the linear predictor drawn from a normal
distribution with mean 0 and known SD. The code below models the
effect of body size on offspring, performs parametric bootstrapping,
and then calculates the ratio of observed model SS residuals to mean
parametrically bootstrapped SS residuals. As expected, the "over
dispersed" data yield a SS ratio of much greater than 1.
This particular model seems to be more or less exactly equivalent
to using an observation-level random effect -- see below ...
If this approach seems valid, it seems it would be simple to use sim
to sim variability to put some CIs on this point estimate.
Any help greatly appreciated
The parametric bootstrap is worthwhile for finding reliable
confidence intervals, but otherwise a parametric approach (e.g.
likelihood
I would do this as follows (slightly more compactly):
n.pops <- 10
n.indiv <- 50
set.seed(101)
popid <- gl(n.pops,n.indiv)
bodysize <- rnorm(length(popid),30,4)
d <- data.frame(popid,bodysize,obs=factor(1:(n.pops*n.indiv)))
library(lme4)
d$y <- simulate(~bodysize+(1|popid)+(1|obs),family="poisson",
newdata=d,newparams=list(theta=c(0.5,0.8),
beta=c(-0.5,0.05)))[[1]]
f <- glmer(y~bodysize+(1|popid)+(1|obs),family="poisson",
data=d)
p <- profile(f,which="theta_")
confint(p)
Random effects:
Groups Name Std.Dev.
obs (Intercept) 0.4795
popid (Intercept) 0.7990
Number of obs: 500, groups: obs, 500 popid, 10
Fixed Effects:
(Intercept) bodysize
-1.05462 0.06281
The intercept looks a bit low, but the std. error is also large
(0.37)
confidence intervals:
2.5 % 97.5 %
.sig01 0.4076925 0.5568827
.sig02 0.5328584 1.3516996