glm with binomial errors - problem with overdispersion - R-help

Mon, Jun 13, 2011 1:33 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110613/de22ede3/attachment.pl>

Brian Ripley

Mon, Jun 13, 2011 11:13 PM #

I presume you intended 'type' and 'fragment' to be factors (see 
below).  Such a model would fit exactly.  The additive model

is only modestly over-dispersed, and shows that 'fragment' has zero 
effect.  Not 'a negligible effect', but no effect.  So something 
really odd is going on: is this an exercise with artificial data?
Otherwise you need to explain the exact balance between the two 
'fragments' (each fragment has exactly 1/4 success) and your 
assumption of independent binomial sampling cannot be true.

Using a quasibinomial model does not change the deviance (see e.g. 
McCullagh and Nelder for the definitions, including of 'scaled 
deviance')), but it does change the standard errors.

On Mon, 13 Jun 2011, Anna Mill wrote:

You have types and fragments but no species and no sites.  At least 
'sites' should be a factor, as should 'categories of seed sizes'.

In the model summary the residual deviance is much higher than the degree
of freedom (Residual deviance: 153.74  on 4  degrees of freedom) and even
after correcting for overdispersion by using a quasibinomial error structure
instead of binomial the residual deviance does not change. Is this a data
problem and I cannot use this statistic or is it because I do something
wrong with R (see models attached)?

Thanks a lot for your help!
Anna


first model with binomial error structure:

success<-c(14,43,44,1,13,28,56,8)
failure<-c(88,59,58,101,92,77,49,97)
"fragment"<-c(1,1,1,1,2,2,2,2)
"type"<-c(1,2,3,4,1,2,3,4)
y<-cbind(success,failure)
model<-glm(y~fragment*type,binomial)
summary(model)

Call:
glm(formula = y ~ fragment * type, family = binomial)

Deviance Residuals:
     1        2        3        4        5        6        7        8
-4.0175   3.3716   4.5052  -6.0071  -2.8063   0.5449   6.0414  -5.0184

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.04433    0.61072   0.073   0.9421
fragment      -0.65477    0.39001  -1.679   0.0932 .
type          -0.46664    0.23027  -2.027   0.0427 *
fragment:type  0.26636    0.14455   1.843   0.0654 .
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance: 157.96  on 7  degrees of freedom
Residual deviance: 153.74  on 4  degrees of freedom
AIC: 196.31

Number of Fisher Scoring iterations: 5

second model with quasibinomial error structure:

summary(model2)

Call:
glm(formula = y ~ fragment * type, family = quasibinomial)

Deviance Residuals:
     1        2        3        4        5        6        7        8
-4.0175   3.3716   4.5052  -6.0071  -2.8063   0.5449   6.0414  -5.0184

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.04433    3.63550   0.012    0.991
fragment      -0.65477    2.32169  -0.282    0.792
type          -0.46664    1.37073  -0.340    0.751
fragment:type  0.26636    0.86048   0.310    0.772

(Dispersion parameter for quasibinomial family taken to be 35.43628)

   Null deviance: 157.96  on 7  degrees of freedom
Residual deviance: 153.74  on 4  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

	[[alternative HTML version deleted]]

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Anna Mill

Mon, Jun 13, 2011 11:23 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110614/e10e9232/attachment.pl>

Peter Dalgaard

Tue, Jun 14, 2011 12:07 AM #

On Jun 14, 2011, at 08:13 , Prof Brian Ripley wrote:

Also note that success+failure is exactly 102 in fragment 1 and 105 in fragment 2, as is the sum of the successes for each fragment (of course it has to to make exactly 1/4). It is rather easy to suspect that it is actually a 0/1 coding of the type (as in "tick exactly one box"), and not independent binomial data.

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Anna Mill

Tue, Jun 14, 2011 12:53 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110614/5e063f57/attachment.pl>

Peter Dalgaard

Tue, Jun 14, 2011 3:21 AM #

On Jun 14, 2011, at 09:53 , Anna Mill wrote:

Well, it's your data, and only you can tell what the original data looks like. We can only _suspect_ that they might be generated to be mutually exclusive. 

If you do not have independent binomial data, then a glm(..., binomial) will be seriously inappropriate (and a simple chi-square on the table of "successes" by type and fragment will be the obvious thing to do).

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com