Poisson Regression: questions about tests of assumptions
Hi Eiko, This is not a direct response to your question, but I thought you might find these pages helpful: On negative binomial: http://www.ats.ucla.edu/stat/R/dae/nbreg.htm and zero inflated poisson: http://www.ats.ucla.edu/stat/R/dae/zipoisson.htm In general this page lists a variety of different models with examples: http://www.ats.ucla.edu/stat/dae/ I wrote the code for most of those pages. Some of your questions seema little more specific than would be addressed on those general introductory pages, but if there are particular things you would like to see done, I am open to hearing about it. One thing I do want to mention is that for publication or presentation purposes, graphs of predicted counts can be quite nice for readers---I tried to show how to do that in all of those pages, and you can even use bootstrapping (which I gave examples for at least in the zero inflated models) to get robust confidence intervals. Cheers, Josh
On Sun, Oct 14, 2012 at 3:14 PM, Eiko Fried <torvon at gmail.com> wrote:
Thank you for the detailed answer, that was really helpful.
I did some excessive reading and calculating in the last hours since your
reply, and have a few (hopefully much more informed) follow up questions.
1) In the vignette("countreg", package = "pscl"), LLH, AIC and BIC values
are listed for the models Negative-binomial (NB), Zero-Inflated (ZI), ZI
NB, Hurdle NB, and Poisson (Standard). And although I found a way to
determine LLH and DF for all models, BIC & AIC values are not displayed by
default, neither using the code given in the vignette. How do I calculate
these? (AIC is given as per default only in 2 models, BIC in none).
2) For the zero-inflated models, the first block of count model
coefficients is only in the output in order to compare the changes,
correct? That is, in my results in a paper I would only report the second
block of (zero-inflation) model coefficients? Or do I misunderstand
something? I am confused because in their large table to compare
coefficients, they primarily compare the first block of coefficients (p. 18)
R> fm <- list("ML-Pois" = fm_pois, "Quasi-Pois" = fm_qpois, "NB" = fm_nbin,
+ "Hurdle-NB" = fm_hurdle, "ZINB" = fm_zinb)
R> sapply(fm, function(x) coef(x)[1:8])
3)
There are various formal tests for this, e.g., dispersiontest() in package "AER".
I have to run 9 models - I am testing the influence of several predictors on different individual symptoms of a mental disorder, as "counted" in the last week (0=never in the last week, to 3 = on all day within the last week). So I'm regressing the same predictors onto 9 different symptoms in 9 models. Dispersiontest() for these 9 models result in 3-4 overdispersed models (depending if testing one- or two-sided on p=.05 level), 2 underdispersed models, and 4 non-significant models. The by far largest dispersion value is 2.1 in a model is not overdispersed according to the test, but that's the symptom with 80% zeros, maybe that plays a role here. Does BN also make sense in underdispersed models? However, overdispersion can already matter before this is detected by a
significance test. Hence, if in doubt, I would simply use an NB model and you're on the safe side. And if the NB's estimated theta parameter turns out to be extremely large (say beyond 20 or 30), then you can still switch back to Poisson if you want.
Out of the 9 models, the 3 overdispersed NB models "worked" with theta
estimates between 0.4 and 8.6. The results look just fine, so I guess NB is
appropriate here.
4 other models (including the 2 underdispersed ones) gave warnings (theta
iteration / alternation limit reached), and although the other values look
ok (estimates, LLH, AIC), theta estimates are between 1.000 and 11.000.
Could you recommend me what to do here?
Thanks
T
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/