Thanks Thierry - That totally makes sense. Is there some way of formally
checking that, except thinking about the setting and underlying processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp,
Do you have just lots of zero's, or more zero's than the Poisson
distribution can explain? Those are two different things. The example
below generates data from a Poisson distribution and has 99% zero's
but no zero-inflation. The second example has only 1% zero's but is
clearly zero-inflated.
set.seed(1)
n <- 1e8
sim <- rpois(n, lambda = 0.01)
mean(sim == 0)
hist(sim)
sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda = 1000)
mean(sim.infl == 0)
hist(sim.infl)
So before looking for zero-inflated models, try to model the zero's.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-23 10:07 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Dear group - I am currently fitting a Poisson glmer where I have
an excess of outcomes that are zero (>95%). I am now debating on
how to proceed and came up with three options:
1.) Just fit a regular glmer to the complete data. I am not fully
sure how interpret the coefficients then, are they more optimizing
towards distinguishing zero and non-zero, or also capturing the
differences in those outcomes that are non-zero?
2.) Leave all zeros out of the data and fit a glmer to only those
outcomes that are non-zero. Then, I would only learn about
differences in the non-zero outcomes though.
3.) Use a zero-inflated Poisson model. My data is quite
large-scale, so I am currently playing around with the EM
implementation of Bolker et al. that alternates between fitting a
glmer with data that are weighted according to their zero
probability, and fitting a logistic regression for the probability
that a data point is zero. The method is elaborated for the OWL
data in:
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as I would do
for my idea of 2)? And then on top of that interpret the
coefficients for the logistic regression regarding whether
something is in the perfect or imperfect state? I am also not
quite sure what the common approach for the zformula is here. The
OWL elaborations only use zformula=z~1, so no random effect; I
would use the same formula as for the glmer.
I am appreciating some help and pointers.
Thanks!
Philipp