overdispersion with binomial data?
Although the idea that binary data cannot be overdispersed by definition sounds reasonable, in fact this means little. Consider a grouped data study with each group having an n and x corresponding to trials and successes in the group. This leads to overdispersion typically, because of positive correlation in the group. New "explode" the groups into individual binary data, with n such data for each group and x success rows and n-x failure rows. The resulting binary cannot "by definition" be overdispersed. This is, however, just a pea-in-shell game. The overdispersion in the first dataset is now clustering in the second dataset. The cluster variable is "group". The same effect is there, just as a different term in the model. Including an "observation" variable to deal with overdispersion is equivalent to adding the same clustering variable in the binary dataset. "What's in a name? That which we call a rose by any other name would smell as sweet." "There is no such thing as a free lunch."
At 08:00 AM 2/12/2011, Jarrod Hadfield wrote:
Hi Colin, I have little to add over what John Maindonald said, but I see your second question regarding my suggestions for binary/binomial data was not answered. In most studies I think binomial data will be over-dispersed and adding an observation-level random effect can be a good way of modeling this. You can think of the n trials of a binomial observation as a group of n correlated binary variables. The variance associated with the observation-level term essentially estimates how strong this correlation is (after accounting for other fixed/random effects in the model). If the original data are already binary then n=1 and there can be no correlation, and so over-dispersion with binary data cannot exist. Cheers, Jarrod Quoting Colin Wahl <biowahl at gmail.com>:
In anticipation of the weekend: In my various readings(crawley, zuur, bolker's ecological models book, and the GLMM_TREE article, reworked supplementary material and R help posts) the discussion of overdispersion for glmm is quite convoluted by different interpretations, different ways to test for it, and different solutions to deal with it. In many cases differences seem to stem from the type of data being analyzed (e.g. binomial vs. poisson) and somewhat subjective options for which type of residuals to use for which models. The most consistent definition I have found is overdispersion is defined by a ratio of residual scaled deviance to the residual degrees of freedom > 1. Which seems simple enough.
modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
family=binomial(link="logit"))
rdev <- sum(residuals(modelBQ)^2) mdf <- length(fixef(modelBQ)) rdf <- nrow(ept)-mdf rdev/rdf #9.7 >>1
So I conclude my model is overdispersed. The recent consensus solution seems to be to create and add a individual level random variable to the model. ept$obs <- 1:nrow(ept) #create individual level random variable 1:72 modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs), data=ept, family=binomial(link="logit")) I take a look at the residuals which are now much smaller but are... just... too... good... for my ecological (glmm free) experience to be comfortable with. Additionally, they fit better for intermediate data, which, with binomial errors is the opposite of what I would expect. Feel free to inspect them in the attached image (if attachments work via mail list... if not, I can send it directly to whomever is interested). Because it looks too good... I test overdispersion again for the new model: rdev/rdf #0.37 Which is terrifically underdispersed, for which the consensus is to ignore it (Zuur et al. 2009). So, for my questions: 1. Is there anything relevant to add to/adjust in my approach thus far? 2. Is overdispersion an issue I should be concerned with for binomial errors? Most sources think so, but I did find a post from Jerrod Hadfield back in august where he states that overdispersion does not exist with a binary response variable: http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in subsequent posts he recommends the approach I have taken by using an individual level random variable). 3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F tests instead of Z or X^2 tests to get p values because they "account for the uncertainty in the estimates of overdispersion." That seems like a nice simple option, I have not seen this come up in any other readings. Thoughts? Here are the glmer model outputs: ModelB Generalized linear mixed model fit by the Laplace approximation Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) Data: ept AIC BIC logLik deviance 754.3 777 -367.2 734.3 Random effects: Groups Name Variance Std.Dev. stream:rip (Intercept) 0.48908 0.69934 stream (Intercept) 0.18187 0.42647 Number of obs: 72, groups: stream:rip, 24; stream, 12 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.28529 0.50575 -8.473 < 2e-16 *** wshd -2.06605 0.77357 -2.671 0.00757 ** wshf 3.36248 0.65118 5.164 2.42e-07 *** wshg 3.30175 0.76962 4.290 1.79e-05 *** ripN 0.07063 0.61930 0.114 0.90920 wshd:ripN 0.60510 0.94778 0.638 0.52319 wshf:ripN -0.80043 0.79416 -1.008 0.31350 wshg:ripN -2.78964 0.94336 -2.957 0.00311 ** ModelBQ Generalized linear mixed model fit by the Laplace approximation Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs) Data: ept AIC BIC logLik deviance 284.4 309.5 -131.2 262.4 Random effects: Groups Name Variance Std.Dev. obs (Intercept) 0.30186 0.54942 stream:rip (Intercept) 0.40229 0.63427 stream (Intercept) 0.12788 0.35760 Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.2906 0.4935 -8.694 < 2e-16 *** wshd -2.0557 0.7601 -2.705 0.00684 ** wshf 3.3575 0.6339 5.297 1.18e-07 *** wshg 3.3923 0.7486 4.531 5.86e-06 *** ripN 0.1425 0.6323 0.225 0.82165 wshd:ripN 0.3708 0.9682 0.383 0.70170 wshf:ripN -0.8665 0.8087 -1.071 0.28400 wshg:ripN -3.1530 0.9601 -3.284 0.00102 ** Cheers, -- Colin Wahl Department of Biology Western Washington University Bellingham WA, 98225 ph: 360-391-9881
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"