hurdle model
On Thu, 2010-08-19 at 14:54 +0300, Gavin Simpson wrote:
On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote:
They fit several models and compare them:
I. Poisson
II. Negative Binomial
III. Quasi-likelihood
IV. Hurdle model
V. zero-inflated model
III should be a quasi-poisson model, i.e. you fit the Poisson GLM
using
quasi-likelihood and model the dispersion parameter \phi alongside the
usual Poisson GLM parameters.
Section 2.3 of their paper on the hurdle model doesn't even mention
"quasi". Though they do mention this in Table2.
Reading this, I think they cooked this model themselves - you can fit
a
binomial model yourself for the presence absence and then fit a count
model for the samples predicted to be present from the binomial part.
To
make things simple I suspect they fitted the count part as
quasi-Poisson
but no-where does it say exactly what they did.
I know that at least Jane Elith has an email address (I have used it years ago), so you could ask her. However, it may be that their hurdle model uses just Poisson, and there is a minor mistake in their Table 2. You can use quasipoisson() or poisson() in glm() in a very natural way: the fitting happens via iteratively reweighted least squares, and all you need to define is the relationship between fitted values and variance. If you look at poisson() and quasipoisson() functions in R (these provide the backbone of the glm(..., family=)), you see that the differences are that quasipoissoin()$aic() always returns NA, and quasipoisson() lacks item simulate(). Otherwise they work in a similar way. Except in poisson() you take the scale (\phi) to be 1, and in quasipoisson() you estimate the scale from the fitted model. Then you just multiply standard errors with the scale, use F tests instead of Chisq in anova() etc. I am not sure (or actually, I don't think) that this fitting parallelism extends to *truncated* Poisson that is used in pscl::hurdle(). Although you can do fitting by stages, and fit quasipoisson() glm for above-zero values, I don't think this is the correct thing to do when you are not allowed to have new zeros. However, the truncated poisson likelihood model is a huge improvement over hand-fitting glm with iteratively reweighted least squares and assuming constant variance/fit relationship. If you are worried about the overdispersion of the above-zero count data, use the truncated negative binomial model offerred by pscl::hurdle(). It is designed for the purpose (and has a more exciting narrative for ecologists). Cheers, Jari Oksanen