Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do). Thank you for any input at all. Hank Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ "E Pluribus Unum"
Under-dispersion - a stats question?
6 messages · Martin Henry H. Stevens, Peter Dalgaard, Kjetil Holuerson +2 more
"Martin Henry H. Stevens" <HStevens at muohio.edu> writes:
Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do). Thank you for any input at all.
I don't think it is safe to say anything general about this without knowledge of the model and the subject matter. Residual deviances can be terribly misleading. Consider for instance this: y <- c(0,1); w <- c(50,50) summary(glm(y~1, binomial, weights=w)) y1 <- .5; w1 <- 100 summary(glm(y1~1, binomial, weights=w1)) Notice that coeff. and s.e. is exactly the same, but not the residual deviances. Now, in the first case, did the zeros and ones sort themselves into two completely separated groups, or was that just because data was given pre-tabulated?
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
1 day later
Martin Henry H. Stevens wrote:
Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do).
I did'nt see an answer to this. maybe you could treat as a quasimodel, but first you should ask why there is underdispersion. Underdispersion could arise if you have dependent responses, for instance, competition (say, between plants) could produce underdispersion. Then you would be better off changing to an appropriate model. maybe you could post more about your experimental setup? Kjetil
Thank you for any input at all. Hank Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ "E Pluribus Unum"
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
On Mon, 10 Oct 2005, Martin Henry H. Stevens wrote:
Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do). Thank you for any input at all.
This usually indicates a deviation from the large-sample theory because of
small counts. See e.g. MASS4 p.208. Then estimator
residual variance
-----------------
residual degrees of freedom
is unreliable. If the better methods discuss there confirm
under-dispersion, then you probably have some form of negative correlation
and need to look at your experimental setup. (But it is usually are false
alarm.)
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Tue, 2005-10-11 at 17:16 -0400, Kjetil Holuerson wrote:
Martin Henry H. Stevens wrote:
Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do).
I did'nt see an answer to this. maybe you could treat as a quasimodel, but first you should ask why there is underdispersion. Underdispersion could arise if you have dependent responses, for instance, competition (say, between plants) could produce underdispersion. Then you would be better off changing to an appropriate model. maybe you could post more about your experimental setup?
Some ecologists from Bergen, Norway, suggest using quasipoisson with its underdispersed residual error (while I wouldn't do that). However, it indeed would be useful to know a bit more about the setup, like the type of dependent variable. If the dependent variable happens to be the number of species (like it's been in some papers by MHHS), this certainly is *not* Poisson nor quasi-Poisson nor in the exponential family, although it so often is modelled. I've often seen that species richness (number of species -- or in R-speak 'tokens' -- in a collection) is underdispersed to Poisson, and for a good reason. Even there I'd play safe and use poisson() instead of underdispersed quasipoisson(). cheers, jari oksanen
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
Hello all:
Thank you for you interest.
This text of this email is in the attached "R-help.r" file.
The R script is in "R-helpscript.r".
The data set is "wk6trial.csv".
-------------- next part --------------
One of my students has performed a laboratory experiment with petri
dishes containing hundreds of species of bacteria, and six species
each of algae and ciliated protozoans. Our goal was to examine the
effects of nutrient concentration and dish size on the number of
species of each group remaining after six weeks.
I attached the data set and some code for the algae analysis.
We had four dish sizes (factor), seven nutrient concentrations
(continuous), and three replicates of each unique treatment
combination, for a total n = 84.
Our response variables were (i) the number of bacterial species
(0-400 species, modeled with quasipoisson), (ii) the proportion of
algae species (out of six initial species - modeled with binomial)
and (iii) the proportion of protozoan species (out of six initial
species - modeled with binomial). For algae and protozoans, we
modeled the proportion of species rather than the raw number because
in each case we were constrained by the design to have between 0 and
6 species. I discussed this with a local statistician, and he thought
it made sense.
Each of these response variables is the combined result of both
unknown species' responses to treatments as well as the unknown
interactions among species. Further, these three responses are
themselves interdependent to some degree. For instance, the number
and identity of protozoan species may influence the number of
bacterial species. Nonetheless, it is common practice in ecology to
model the number of species of a group (or its logarithm) with a
univariate model assuming either a normal or Poisson error
distribution. I would HAPPILY learn better.
While modeling these groups, I consulted a few texts (Neter et al.
1996, Venables and Ripley 2002, Dalgaard 2002, Crawley 2002, Fox
2002) and attempted to follow standard procedures laid out in these
books.
For the algae and the protozoans, I began with a binomial model,
glm(cbind(AS, 6-AS) ~ Nutrients + I(Nutrients^2) + Size +
Nutrients:Size + I(Nutrients^2):Size, data=dat,
family=binomial)
where AS is the number of algae species in a dish. I retained this
family upon observation that the residual dev. / residual DF was (for
algae) = 0.19. I minimized the model by hand based on the F tests
(not the treatment contrast coefficients, after V&R p. 197 - Hauck
and Donner 1977) and using step() and found that the only significant
treatment was a linear effect of nutrient concentration. I examined
the qq plot, the resid ~ fitted plot, and Cook's distances and
everything looked fine.
When I repeated this with quasibinomial, it estimated the dispersion
parameter (0.19), I found that both Size and Nutrients were
significant (no interaction).
So,... my orignal question to the list was, is it appropriate to
model and fit the error distribution with quasi- functions if
dispersion seems much less than 1.0?
Now I am unclear how to evaluate under-dispersion (even after
consulting V&R 2002, p. 208-209).
Upon reading through this, if you made it this far, you may have lots
of other comments as well, and I truly hope to become better educated
as a result!
BTW, I modeled the bacteria with a quasipoisson (dispersion = 91!).
Perhaps a negative binomial would have been better?
Many thanks for your inputs,
Hank Stevens
On Oct 12, 2005, at 1:10 AM, Jari Oksanen wrote:
On Tue, 2005-10-11 at 17:16 -0400, Kjetil Holuerson wrote:
Martin Henry H. Stevens wrote:
Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do).
I did'nt see an answer to this. maybe you could treat as a quasimodel, but first you should ask why there is underdispersion. Underdispersion could arise if you have dependent responses, for instance, competition (say, between plants) could produce underdispersion. Then you would be better off changing to an appropriate model. maybe you could post more about your experimental setup?
Some ecologists from Bergen, Norway, suggest using quasipoisson with its underdispersed residual error (while I wouldn't do that). However, it indeed would be useful to know a bit more about the setup, like the type of dependent variable. If the dependent variable happens to be the number of species (like it's been in some papers by MHHS), this certainly is *not* Poisson nor quasi-Poisson nor in the exponential family, although it so often is modelled. I've often seen that species richness (number of species -- or in R-speak 'tokens' -- in a collection) is underdispersed to Poisson, and for a good reason. Even there I'd play safe and use poisson() instead of underdispersed quasipoisson(). cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ "E Pluribus Unum"