An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20101029/a016a1c1/attachment.pl>
analysis of count data with many zero values
4 messages · Steve Hong, ONKELINX, Thierry, Christopher David Desjardins
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20101029/78985c9c/attachment.pl>
Hi Steve, In addition to the comments of Chris, I would like to add the a high number of zero's does not imply a zero-inflated distribution. Have a look at the example below. HTH, Thierry
set.seed(123) # ordinay poisson with 91% zero counts <- rpois(10000, lambda = 0.1) mean(counts == 0)
[1] 0.9105
table(counts)
counts 0 1 2 3 9105 855 39 1
# ordinay poisson with 99% zero counts <- rpois(10000, lambda = 0.01) mean(counts == 0)
[1] 0.9912
table(counts)
counts 0 1 9912 88
# ordinay poisson without zero counts <- rpois(10000, lambda = 10) mean(counts == 0)
[1] 0
table(counts)
counts 1 2 3 4 5 6 7 8 9 10 4 21 86 203 389 606 887 1144 1277 1243 11 12 13 14 15 16 17 18 19 20 1147 904 707 553 367 218 108 75 36 10 21 22 24 27 9 4 1 1
# zero-inflated poisson with 50% zero's # 20% zero's from the inflation # 30% zero's from the poisson # 50% non-zero from the poisson zi <- rbinom(10000, prob = 0.2, size = 1) counts <- rpois(10000, lambda = 1) counts[zi == 1] <- 0 mean(counts == 0)
[1] 0.4961
table(counts)
counts 0 1 2 3 4 5 6 7 4961 2946 1463 472 129 25 3 1 ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Christopher Desjardins Verzonden: vrijdag 29 oktober 2010 16:47 Aan: Steve Hong CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] analysis of count data with many zero values Hi Steve, The MCMCglmm package has several different models that you could fit to zero-inflated count data. You can fit zero-inflated Poisson models, hurdle models, zero-alterated and zero-truncated models. I don't believe you can fit zero-inflated negative binomials with that package but I could be wrong. Also I believe that ZINB models work well when you have zero-inflated and non-zero overdispersed data. You could also roll your own using rjags or r2winbugs, etc. There are lots of publications out there examining zero-inflation especially using MCMC based approaches. (Do a quick Google Scholar search for zero-inflated multilevel models). In addition, Jarrod Hadfield's CourseNotes (they come w/ MCMCglmm) are also quite informative and provide some examples of how you might fit such a model. In my experience with count data that are highly zero-inflated (86% of all data were zeroes), the ZIP model worked well but converged very slowly and required about 60,000 MCMC iterations. If you'd like to see the code I can share it as well. Also I believe this topic has come up several times and I would encourage to search through the archives of R-Sig-Mixed-Models. HTH, Chris On Fri, Oct 29, 2010 at 9:32 AM, Steve Hong <emptican at gmail.com> wrote:
Dear list, This is the first time I have this type of data. I have count data collected repeatedly from the same plot with multiple years
(14 yrs)
and have found that proportion of 'zero' values are very
high (average
of proportion is about 92 %, min: 53 %, max: 100 %). Only one year has 53% of zeros in the data and the rest of years have at least greater than 86% zero values in the data set. The objective of the study is to develop predictive models and validate them, for example, using cross validation. Variables collected are: year, insect count, longitude,
latitude, soil
properties (x1...x4). Since data have too many zero observations, I am thinking
about using
zero inflated model to fit the data. However, I am very
new to this method.
My questions are: 1. Is it possible to use zero inflated model to fit data with about 90% zeros? I am wondering if zero proportion is too high
to make any
inference using statistical methods. 2. If I can use zero inflated models, can I use either Poisson distribution or negative binomial distribution? Or both? 3. Do you have any good reference (paper and/or website)
for good and
'easy' tutorial for me to study? I am wondering if I provided enough information or submitted it to correct mailing list. Please let me know if you have any
comments and suggestions.
I would greatly appreciate that.
Thank you very much in advance!!!
Steve
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Christopher David Desjardins Ph.D. student, Quantitative Methods in Education M.S. student, Statistics University of Minnesota [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20101029/94c2fac3/attachment.pl>