Skip to content
Prev 261524 / 398502 Next

Zero-inflated regression models: predicting no 0s

On Tue, 31 May 2011, Jean-Simon Michaud wrote:

            
Yes, the fitted() method and also the default of the predict() method is 
to compute fitted _means_. And the mean of a count distribution will 
always be non-zero.

Moreover, even when you round the fitted values to integers, this does 
_not_ lead to the most likely count. Consider the following simple 
examples:

Probability density for a Poisson distribution with mean 0.8

R> dpois(0:5, lambda = 0.8)
[1] 0.449329 0.359463 0.143785 0.038343 0.007669 0.001227

i.e., zero is still the most likely outcome with a probability of 45% even 
though the mean is 0.8. And for negative binomial distributions, this can 
be even more extreme. The probability density for a geometric distribution 
(negative binomial with size = 1) and mean 2:

R> dnbinom(0:5, mu = 2, size = 1)
[1] 0.33333 0.22222 0.14815 0.09877 0.06584 0.04390

i.e., despite the mean of 2, zero is still the most likely outcome.

You can get the predicted probabilities for all observations via
predict(hurdle1, type = "prob") and predict(zip1A, type = "prob"), 
respectively. Given the results for your negative binomial hurdle model, I 
suspect that the zero-inflated Poisson fit will have a nonsatisfactory fit 
for the zeros, even though the predicted means of the two models are 
similar.

See also the paper accompanying the count regression functions in "pscl" 
(not "lpsc"):

   http://www.jstatsoft.org/v27/i08/

Finally, a short comment on your model formula:

   hurdle(TOT ~ LC80 + LC231 + DEM, data = mydata_purge, ...)

will be easier to read and less confusing (after all "data = food" does 
not appear to be used at all).

hth,
Z