On Sat, 5 May 2012, Christopher Desjardins wrote:
Hi, I am a little confused at the output from predict() for a zeroinfl object. Here's my confusion: ## From zeroinfl package fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin") ## The raw zero-inflated overdispersed data
table(bioChemists$art)
? 0?? 1?? 2?? 3?? 4?? 5?? 6?? 7?? 8?? 9? 10? 11? 12? 16? 19 275 246 178? 84? 67? 27? 17? 12?? 1?? 2?? 1?? 1?? 2?? 1?? 1 ## The default output from predict. It looks like it is doing a horrible job. Does it really predict 7 zeros?
No, see also this R-help post on "Zero-inflated regression models: predicting no 0s": https://stat.ethz.ch/pipermail/r-help/2011-June/279765.html The predicted _mean_ of a negative binomial distribution is not the most likely outcome (i.e., the _mode_) of the distribution. The post above presents some hands on examples.
table(round(predict(fm_zinb2)) )
? 0?? 1?? 2?? 3?? 4?? 5?? 6? 10 ? 7 354 487? 45? 12?? 6?? 3?? 1 ##? The output from predict using "count"
table(round(predict(fm_zinb2,type="count")))
? 1?? 2?? 3?? 4?? 5?? 6? 10 312 536? 45? 12?? 6?? 3?? 1 ## The output from predict using "zero", but here it predicts 24 "structural" zeros?
table(round(predict(fm_zinb2,type="zero")))
? 0?? 1 891? 24 So my question is how do I interpret these different outputs from the zeroinf object? What are the differences? The help page just left me confused. I would expect that table(round(predict(fm_zinb2))) would be E(Y) and would most accurately track table(bioChemists$art) but I am wrong. How can I find the E(Y) that would most closely track the raw data? Thanks, Chris