Poisson Regression: questions about tests of assumptions

Achim Zeileis · 2012-10-15T06:04:53Z

On Sun, 14 Oct 2012, Eiko Fried wrote: > Thank you for the detailed answer, that was really helpful.?I did some > excessive reading and calculating in the last hours since your reply, > and have a few (hopefully much more informed) follow up questions. > > 1) In the?vignette("countreg", package = "pscl"), LLH, AIC and BIC > values are listed for the models Negative-binomial (NB), Zero-Inflated > (ZI), ZI NB, Hurdle NB, and Poisson (Standard). And although I found a > way to determine LLH

Achim Zeileis

Sun, Oct 14, 2012 11:04 PM

On Sun, 14 Oct 2012, Eiko Fried wrote:

The code underlying the vignette can always be inspected:
edit(vignette("countreg", package = "pscl"))
The JSS also hosts a cleaned-up version of the replication code.

Most likelihood-based models provide a logLik() method which extracts the 
fitted log-likelihood along with the corresponding degrees of freedom. 
Then the default AIC() method can compute the AIC and AIC(..., k = log(n)) 
computes the corresponding BIC. This is hinted at in Table 3 of the 
vignette. If "n", the number of observations, can be extracted by nobs(), 
then also BIC() works. This is not yet the case for zeroinfl/hurdle, 
though.

No, no, yes. Hurdle and zero-inflation models have two linear predictors, 
one for the zero hurdle/inflation and one for the truncated/un-inflated 
count component. Both are typically of interest for different aspects of 
the data.

All models considered have a predictor for the mean of the count 
component, hence this can be compared across all models.

That's not really a count response. I guess an ordered response model (see 
e.g. polr() in MASS or package ordinal) would probably be more 
appropriate.

The variance function of the NB2 model is mu + 1/theta * mu^2. Hence for 
finite theta, the variance is always larger than the mean.

As I said before: A theta around 20 or 30 is already so large that it is 
virtually identical to a Poisson fit (theta = Inf). These values clearly 
indicate that theta is not finite.

However, this almost certainly stems from your response which is not 
really count data. As I said above: An ordered response model will 
probably work better here. If you have mainly variation between 0 vs. 
larger but not much among the levels 1/2/3, a binary response (0 vs. 
larger) may be best.

hth,
Z

Poisson Regression: questions about tests of assumptions

Thread (6 messages)