1) Different types of residuals serve different purposes.
2) I am of the school that thinks it misguided to use the
results of a test for overdispersion to decide whether to
model it. If there is any reason to suspect over-dispersion
(and in many/most ecological applications there is), this
is anti-conservative. I judge this a misuse of statistical
testing. While, some do rely on the result of a test in these
circumstances, I have never seen a credible defence of
this practice.
3) In fitting a quasi model using glm(), McCullagh and Nelder
(which I do not have handy at the moment) argue, if I recall
correctly, for use of the Pearson chi-square estimate. The
mean deviance is unduly susceptible to bias.
4) Whereas the scale factor (sqrt dispersion estimate) is incorporated
into the GLM residuals, the residuals from glmer() exclude all
random effects except that due to poisson variation. The residuals
are what remains after accounting for all fixed and random effects,
including observation level random effects.
5) Your mdf divisor is too small. Your stream, stream:rip and ID
random terms account for further 'degrees of freedom'. Maybe
degrees of freedom are not well defined in this context? Anyone
care to comment? The size of this quantity cannot, in any case, be
used to indicate over-fitting or under-fitting. The model assumes
a theoretical value of 1. Apart from bias in the estimate, the
residuals are constrained by the model to have magnitudes that
are consistent with this theoretical value.
6) If you fit a non-quasi error (binomial or poisson) in a glm model,
the summary output has a column labeled "z value". If you fit a quasi
error, the corresponding column is labeled "t value". In the glmer
output, the label 'z value' is in my view almost always inappropriate.
To the extent that the description carries across, it is the counterpart
of the "t value" column in the glm output with the quasi error term.
(Actually, in the case where the denominator is entirely composed
from the theoretical variance, Z values that are as near as maybe
identical can almost always [always?] be derived using an
appropriate glm model with a non-quasi error term.)
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 12/02/2011, at 12:58 PM, Colin Wahl wrote:
In anticipation of the weekend:
In my various readings(crawley, zuur, bolker's ecological models book, and
the GLMM_TREE article, reworked supplementary material and R help
discussion of overdispersion for glmm is quite convoluted by different
interpretations, different ways to test for it, and different solutions to
deal with it. In many cases differences seem to stem from the type of data
being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
for which type of residuals to use for which models.
The most consistent definition I have found is overdispersion is defined by
a ratio of residual scaled deviance to the residual degrees of freedom > 1.
Which seems simple enough.
modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
family=binomial(link="logit"))
rdev <- sum(residuals(modelBQ)^2)
mdf <- length(fixef(modelBQ))
rdf <- nrow(ept)-mdf
rdev/rdf #9.7 >>1
So I conclude my model is overdispersed. The recent consensus
to be to create and add a individual level random variable to the model.
ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
data=ept, family=binomial(link="logit"))
I take a look at the residuals which are now much smaller but
too... good... for my ecological (glmm free) experience to be comfortable
with. Additionally, they fit better for intermediate data, which, with
binomial errors is the opposite of what I would expect. Feel free
them in the attached image (if attachments work via mail list... if not, I
can send it directly to whomever is interested).
Because it looks too good... I test overdispersion again for the new model:
rdev/rdf #0.37
Which is terrifically underdispersed, for which the consensus is to ignore
it (Zuur et al. 2009).
So, for my questions:
1. Is there anything relevant to add to/adjust in my approach thus far?
2. Is overdispersion an issue I should be concerned with for binomial
errors? Most sources think so, but I did find a post from Jerrod Hadfield
back in august where he states that overdispersion does not exist with a
binary response variable:
http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
subsequent posts he recommends the approach I have taken by using an
individual level random variable).
3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
tests instead of Z or X^2 tests to get p values because they "account for
the uncertainty in the estimates of overdispersion." That seems like a nice
simple option, I have not seen this come up in any other
Here are the glmer model outputs:
ModelB
Generalized linear mixed model fit by the Laplace approximation
Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
Data: ept
AIC BIC logLik deviance
754.3 777 -367.2 734.3
Random effects:
Groups Name Variance Std.Dev.
stream:rip (Intercept) 0.48908 0.69934
stream (Intercept) 0.18187 0.42647
Number of obs: 72, groups: stream:rip, 24; stream, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.28529 0.50575 -8.473 < 2e-16 ***
wshd -2.06605 0.77357 -2.671 0.00757 **
wshf 3.36248 0.65118 5.164 2.42e-07 ***
wshg 3.30175 0.76962 4.290 1.79e-05 ***
ripN 0.07063 0.61930 0.114 0.90920
wshd:ripN 0.60510 0.94778 0.638 0.52319
wshf:ripN -0.80043 0.79416 -1.008 0.31350
wshg:ripN -2.78964 0.94336 -2.957 0.00311 **
ModelBQ
Generalized linear mixed model fit by the Laplace approximation
Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
Data: ept
AIC BIC logLik deviance
284.4 309.5 -131.2 262.4
Random effects:
Groups Name Variance Std.Dev.
obs (Intercept) 0.30186 0.54942
stream:rip (Intercept) 0.40229 0.63427
stream (Intercept) 0.12788 0.35760
Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.2906 0.4935 -8.694 < 2e-16 ***
wshd -2.0557 0.7601 -2.705 0.00684 **
wshf 3.3575 0.6339 5.297 1.18e-07 ***
wshg 3.3923 0.7486 4.531 5.86e-06 ***
ripN 0.1425 0.6323 0.225 0.82165
wshd:ripN 0.3708 0.9682 0.383 0.70170
wshf:ripN -0.8665 0.8087 -1.071 0.28400
wshg:ripN -3.1530 0.9601 -3.284 0.00102 **
Cheers,
--
Colin Wahl
Department of Biology
Western Washington University
Bellingham WA, 98225
ph: 360-391-9881
<ModelComp2.png>_______________________________________________
R-sig-mixed-models at r-project.org mailing list