On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
Thanks, great information, that is really helpful.
I agree that those are different things, however when using a random
effect for overdispersion, I can simulate the same number of zero
outcomes (~95%).
On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those
are two different things.
I've put some information together in
http://rpubs.com/INBOstats/zeroinflation
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
Thanks! Actually, accounting for overdispersion is super important
as it seems, then the zeros can be captured well.
On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp,
1. Fit a Poisson model to the data.
2. Simulate a new response vector for the dataset according to
the model.
3. Count the number of zero's in the simulated response vector.
4. Repeat step 2 and 3 a decent number of time and plot a
histogram of the number of zero's in the simulation. If the
number of zero's in the original dataset is larger than those in
the simulations, then the model can't capture all zero's. In such
case, first try to update the model and repeat the procedure. If
that fails, look for zero-inflated models.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for
Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
Thanks Thierry - That totally makes sense. Is there some way
of formally
checking that, except thinking about the setting and
underlying processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp,
Do you have just lots of zero's, or more zero's than the
distribution can explain? Those are two different things.
below generates data from a Poisson distribution and has
but no zero-inflation. The second example has only 1%
clearly zero-inflated.
set.seed(1)
n <- 1e8
sim <- rpois(n, lambda = 0.01)
mean(sim == 0)
hist(sim)
sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
mean(sim.infl == 0)
hist(sim.infl)
So before looking for zero-inflated models, try to model
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done
more than asking him to perform a post-mortem examination:
able to say what the experiment died of. ~ Sir Ronald
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an
not ensure that a reasonable answer can be extracted from a
of data. ~ John Tukey
2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>:
Dear group - I am currently fitting a Poisson glmer
an excess of outcomes that are zero (>95%). I am now
how to proceed and came up with three options:
1.) Just fit a regular glmer to the complete data. I am
sure how interpret the coefficients then, are they more
towards distinguishing zero and non-zero, or also
differences in those outcomes that are non-zero?
2.) Leave all zeros out of the data and fit a glmer to
outcomes that are non-zero. Then, I would only learn about
differences in the non-zero outcomes though.
3.) Use a zero-inflated Poisson model. My data is quite
large-scale, so I am currently playing around with the EM
implementation of Bolker et al. that alternates between
glmer with data that are weighted according to their zero
probability, and fitting a logistic regression for the
that a data point is zero. The method is elaborated for
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
for my idea of 2)? And then on top of that interpret the
coefficients for the logistic regression regarding whether
something is in the perfect or imperfect state? I am
quite sure what the common approach for the zformula is
OWL elaborations only use zformula=z~1, so no random
would use the same formula as for the glmer.
I am appreciating some help and pointers.
Thanks!
Philipp