An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20140512/1e640b67/attachment.pl>
Diagnosing Overdispersion in Mixed Models with Parametric Bootstrapping
2 messages · Xavier Harrison, Ben Bolker
Xavier Harrison <xav.harrison at ...> writes:
Hi All
I have a question regarding the appropriateness of diagnosing overdispersion in mixed models using parametric bootstrap as provided in the lme4 package.
Following Zuur et al (2009), Bolker et al (2009), and nicely summarised on the glmm wiki, one can approximate overdispersion in a model as the ratio of sum of squared residuals to residual degrees of freedom. However this suffers the problem of not knowing exactly how many residual degrees of freedom there are in a mixed model.
Yes, although keeping in mind that the whole thing is an approximation anyway, getting the residual df wrong by a few percent is hopefully not that important
My question is, given the flexibility of the parametric bootstrapping function in lme4, could one not calculate overdispersion by looking at the ratio of the model SS residuals to the mean SS residuals from n parametric bootstrap samples? Does it follow that if the data are well approximated by a Poisson process, this ratio should be roughly 1, whereas if there is extra variability in the data than a Poisson can predict', the ratio should be a fair bit higher?
This seems quite reasonable.
Below I have provided code to reproduce an example along the lines of what I have just tried to describe. I have simulated two sets of data where a Poisson outcome (number of offspring) is affected by body size (with positive slope). In both cases the data are generated from Poisson process, but in the latter case I add some random noise to the linear predictor drawn from a normal distribution with mean 0 and known SD. The code below models the effect of body size on offspring, performs parametric bootstrapping, and then calculates the ratio of observed model SS residuals to mean parametrically bootstrapped SS residuals. As expected, the "over dispersed" data yield a SS ratio of much greater than 1.
This particular model seems to be more or less exactly equivalent to using an observation-level random effect -- see below ...
If this approach seems valid, it seems it would be simple to use sim to sim variability to put some CIs on this point estimate.
Any help greatly appreciated
The parametric bootstrap is worthwhile for finding reliable
confidence intervals, but otherwise a parametric approach (e.g.
likelihood
I would do this as follows (slightly more compactly):
n.pops <- 10
n.indiv <- 50
set.seed(101)
popid <- gl(n.pops,n.indiv)
bodysize <- rnorm(length(popid),30,4)
d <- data.frame(popid,bodysize,obs=factor(1:(n.pops*n.indiv)))
library(lme4)
d$y <- simulate(~bodysize+(1|popid)+(1|obs),family="poisson",
newdata=d,newparams=list(theta=c(0.5,0.8),
beta=c(-0.5,0.05)))[[1]]
f <- glmer(y~bodysize+(1|popid)+(1|obs),family="poisson",
data=d)
p <- profile(f,which="theta_")
confint(p)
Random effects:
Groups Name Std.Dev.
obs (Intercept) 0.4795
popid (Intercept) 0.7990
Number of obs: 500, groups: obs, 500 popid, 10
Fixed Effects:
(Intercept) bodysize
-1.05462 0.06281
The intercept looks a bit low, but the std. error is also large
(0.37)
confidence intervals:
2.5 % 97.5 %
.sig01 0.4076925 0.5568827
.sig02 0.5328584 1.3516996