Skip to content
Prev 78863 / 398503 Next

Under-dispersion - a stats question?

Hello all:
Thank you for you interest.

This text of this email  is in the attached "R-help.r" file.
The R script is in "R-helpscript.r".
The data set is "wk6trial.csv".
-------------- next part --------------

One of my students has performed a laboratory experiment with petri  
dishes containing hundreds of  species of bacteria, and six species  
each of algae and ciliated protozoans. Our goal was to examine the  
effects of nutrient concentration and dish size on the number of  
species of each group remaining after six weeks.

I attached the data set and some code for the algae analysis.

We had four dish sizes (factor), seven nutrient concentrations  
(continuous), and three replicates of each unique treatment  
combination, for a total n = 84.

Our response variables were (i) the number of bacterial species  
(0-400 species, modeled with quasipoisson), (ii) the proportion of  
algae species (out of six initial species - modeled with binomial)  
and (iii) the proportion of protozoan species (out of six initial  
species - modeled with binomial). For algae and protozoans, we  
modeled the proportion of species rather than the raw number because  
in each case we were constrained by the design to have between 0 and  
6 species. I discussed this with a local statistician, and he thought  
it made sense.

Each of these response variables is the combined result of both  
unknown species' responses to treatments as well as the unknown  
interactions among species. Further, these three responses are  
themselves interdependent to some degree. For instance, the number  
and identity of protozoan species may influence the number of  
bacterial species. Nonetheless, it is common practice in ecology to  
model the number of species of a group (or its logarithm)  with a  
univariate model assuming either a normal or Poisson error  
distribution. I would HAPPILY learn better.

While modeling these groups, I consulted a few texts (Neter et al.  
1996, Venables and Ripley 2002, Dalgaard 2002, Crawley 2002, Fox  
2002) and attempted to follow standard procedures laid out in these  
books.

For the algae and the protozoans, I began with a binomial model,

glm(cbind(AS, 6-AS) ~ Nutrients + I(Nutrients^2) + Size +
             Nutrients:Size + I(Nutrients^2):Size, data=dat,  
family=binomial)

where AS is the number of algae species in a dish. I retained this  
family upon observation that the residual dev. / residual DF was (for  
algae) = 0.19. I minimized the model by hand based on the F tests  
(not the treatment contrast coefficients, after V&R p. 197 - Hauck  
and Donner 1977) and using step() and found that the only significant  
treatment was a linear effect of nutrient concentration. I examined  
the qq plot, the resid ~ fitted plot, and Cook's distances and  
everything looked fine.

When I repeated this with quasibinomial, it estimated the dispersion  
parameter (0.19), I found that both Size and Nutrients were  
significant (no interaction).

So,... my orignal question to the list was, is it appropriate to  
model and fit the error distribution with quasi- functions if  
dispersion seems much less than 1.0?

Now I am unclear how to evaluate under-dispersion (even after  
consulting V&R 2002, p. 208-209).

Upon reading through this, if you made it this far, you may have lots  
of other comments as well, and I truly hope to become better educated  
as a result!

BTW, I modeled the bacteria with a quasipoisson (dispersion = 91!).  
Perhaps a negative binomial would have been better?

Many thanks for your inputs,
Hank Stevens
On Oct 12, 2005, at 1:10 AM, Jari Oksanen wrote:

            
Dr. Martin Henry H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
"E Pluribus Unum"