Skip to content

Poisson Regression: questions about tests of assumptions

6 messages · Eiko Fried, Achim Zeileis, Joshua Wiley +1 more

#
On Sun, 14 Oct 2012, Eiko Fried wrote:

            
There are various formal tests for this, e.g., dispersiontest() in package 
"AER". Alternatively, you can use a simple likelihood-ratio test (e.g., by 
means of lrtest() in "lmtest") between a poisson and negative binomial 
(NB) fit. The p-value can even be halved because the Poisson is on the 
border of the NB theta parameter range (theta = infty).

However, overdispersion can already matter before this is detected by a 
significance test. Hence, if in doubt, I would simply use an NB model and 
you're on the safe side. And if the NB's estimated theta parameter turns 
out to be extremely large (say beyond 20 or 30), then you can still switch 
back to Poisson if you want.
quasipoisson yields the same parameter estimates as the poisson, only the 
inference is adjusted appropriately.
glm.nb() in "MASS" is one of standard options.
The NB is a likelihood-based model while the quasipoisson is not 
associated with a likelihood (but has the same conditional mean equation).
It's one of the possibilities.
I recommend you read the associated documentation. See 
vignette("countreg", package = "pscl")

For glm.nb() I recommend its accompanying documentation, namely the MASS 
book.

hth,
Z
#
Hi Eiko,

This is not a direct response to your question, but I thought you
might find these pages helpful:

On negative binomial:

http://www.ats.ucla.edu/stat/R/dae/nbreg.htm

and zero inflated poisson:

http://www.ats.ucla.edu/stat/R/dae/zipoisson.htm

In general this page lists a variety of different models with examples:

http://www.ats.ucla.edu/stat/dae/

I wrote the code for most of those pages. Some of your questions seema
 little more specific than would be addressed on those general
introductory pages, but if there are particular things you would like
to see done, I am open to hearing about it.

One thing I do want to mention is that for publication or presentation
purposes, graphs of predicted counts can be quite nice for readers---I
tried to show how to do that in all of those pages, and you can even
use bootstrapping (which I gave examples for at least in the zero
inflated models) to get robust confidence intervals.

Cheers,

Josh
On Sun, Oct 14, 2012 at 3:14 PM, Eiko Fried <torvon at gmail.com> wrote:

  
    
#
just a side note for your 4th question.

for a small sample, clarke test instead of vuong test might be more
appropriate and the calculation is so simple that even excel can
handle it :-)
On Sun, Oct 14, 2012 at 12:00 PM, Eiko Fried <torvon at gmail.com> wrote:

  
    
#
On Sun, 14 Oct 2012, Eiko Fried wrote:

            
The code underlying the vignette can always be inspected:
edit(vignette("countreg", package = "pscl"))
The JSS also hosts a cleaned-up version of the replication code.

Most likelihood-based models provide a logLik() method which extracts the 
fitted log-likelihood along with the corresponding degrees of freedom. 
Then the default AIC() method can compute the AIC and AIC(..., k = log(n)) 
computes the corresponding BIC. This is hinted at in Table 3 of the 
vignette. If "n", the number of observations, can be extracted by nobs(), 
then also BIC() works. This is not yet the case for zeroinfl/hurdle, 
though.
No, no, yes. Hurdle and zero-inflation models have two linear predictors, 
one for the zero hurdle/inflation and one for the truncated/un-inflated 
count component. Both are typically of interest for different aspects of 
the data.
All models considered have a predictor for the mean of the count 
component, hence this can be compared across all models.
That's not really a count response. I guess an ordered response model (see 
e.g. polr() in MASS or package ordinal) would probably be more 
appropriate.
The variance function of the NB2 model is mu + 1/theta * mu^2. Hence for 
finite theta, the variance is always larger than the mean.
As I said before: A theta around 20 or 30 is already so large that it is 
virtually identical to a Poisson fit (theta = Inf). These values clearly 
indicate that theta is not finite.

However, this almost certainly stems from your response which is not 
really count data. As I said above: An ordered response model will 
probably work better here. If you have mainly variation between 0 vs. 
larger but not much among the levels 1/2/3, a binary response (0 vs. 
larger) may be best.

hth,
Z