Skip to content

overdispersion and quasibinomial model

6 messages · djpren, Ben Bolker, David Winsemius +1 more

#
I am looking for the correct commands to do the following things:

1. I have a binomial logistic regression model and i want to test for
overdispersion.
2. If I do indeed have overdispersion i need to then run a quasi-binomial
model, but I'm not sure of the command.
3. I can get the residuals of the model, but i need to then apply a shapiro
wilk test to test them. Does anyone know the command for this?

Any help would be hugely appreciated,

Thanks,

Djp
#
On Nov 24, 2009, at 3:41 PM, djpren wrote:

            
Under the teach a man to fish precept,   ... try:

RSiteSearch("test over dispersion binomial models")
?glm
# and follow the appropriate links
RSiteSearch("shapiro-wilks")   # not that people here recommend this  
procedure

The overall flavor of these questions is "homework", so I'm  
speculating that you may want to consult your instructors.
#
Thanks for the reply. Naturally I already searched the site and help for the
answers to these questions. I think I've figured out how to run a
quasi-binomial model, but I cannot figure out how to test for
over-dispersion or how to apply a shapiro-wilk test.

This is not homework, neither do I have an instructor who is proficient in
using R. This program was suggested to me by another researcher after he
witnessed my frustration with the inflexibility of SPSS and other such
programs. I am on a very tight schedule and I don't have time to become a
statistician and computer scientist, which is why I wrote 3 very quick
questions asking for commands that i had already tried to find myself.

Testing for over-dispersion is probably something I can eventually get to
grips with, since I just have get variance for the real and modelled data.
However, I cannot find a command to do shapiro-wilks on the site or on these
forums. Also, why do you say that most people here wouldn't recommend this
procedure?
David Winsemius wrote:

  
    
#
djpren wrote:
??shapiro
stats::shapiro.test     Shapiro-Wilk Normality Test

(maybe you were searching for "shapiro-wilks" (sic)?)

People often disrecommend statistical tests of normality because they 
have low power for small data sets (hence you don't have power to
detect non-normality when it is present) and high power for large
data sets even when the degree of non-normality detected is not
enough to invalidate the results of some statistical procedures.
Under what circumstances are the residuals from a quasibinomial
GLM expected to be normally distributed ... ?
#
On Nov 25, 2009, at 7:04 AM, djpren wrote:

            
"Quick questions" are somewhat deprecated here. Have you read the  
Posting Guide? Its overall message is that the list readership expects  
more detail rather than less. Perhaps with a better search method and  
a pointer to the glm()  function, which will do what was requested,   
you might compose a more complete description of the data and the  
problem, and offer code that shows what progress you are making.
I would have thought my original reply would have pointed the way to  
more effective searching. The obvious search strategy using the  
RSiteSearch function would seem to be:

 > RSiteSearch("shapiro wilks")
A search query has been submitted to http://search.r-project.org
The results page should open in your browser shortly

A Browser window did open up and there were 8 hits, at least two of  
which were to functions that would do what you appear to be determined  
to do on a rather dubious basis.
Are you doing this because some reviewer asked you to do so or because  
you are copying a path that someone else laid out for you? Testing for  
normality in a binomial model seems rather puzzling on the face of it.
#
djpren wrote:
The customary (well, at least to me) to check for overdispersion
is to look at the ratio of the sum of squared Pearson residuals
over residual degrees of freedom. This is well discussed in
MASS (the book).

Example:

library(MASS)
fm1 <- glm(low ~ age + race, family = binomial, data = birthwt)
phi <- sum(resid(fm, type = "pearson")^2) / df.residual(fm)
phi
#[1] 1.011612

For a binomial glm, this value is expected to be near 1.0
as it is here. So there is no indication of overdispersion
in this example.

I don't know of a specific test for overdispersion. Personally,
I start to worry about the adequacy of the model if the data
set is large and phi is greater than about 1.2. For small data
sets I wouldn't be too concerned if phi is less than 1.5.
But this all depends crucially on what you want to do with
your model results. Adjusting phi to be greater than 1.0 will
provide more conservative estimates of the parameters.
Note that using family="quasibinomial" won't change the
parameter estimates, just their SEs.

fm2 <- glm(low ~ age + race, family = quasibinomial, data = birthwt)

Now you can compare summary(fm1) with summary(fm2).

What Shapiro-Wilk has to do with this is: Nothing!

  -Peter Ehlers