Dear group - I am currently fitting a Poisson glmer where I have an excess of outcomes that are zero (>95%). I am now debating on how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am not fully sure how interpret the coefficients then, are they more optimizing towards distinguishing zero and non-zero, or also capturing the differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to only those outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between fitting a glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the probability that a data point is zero. The method is elaborated for the OWL data in: https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf I am not fully sure how to interpret the results for the zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as I would do for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am also not quite sure what the common approach for the zformula is here. The OWL elaborations only use zformula=z~1, so no random effect; I would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
Question about zero-inflated Poisson glmer
29 messages · Thierry Onkelinx, Mollie Brooks, Ben Bolker +2 more
Messages 1–25 of 29
Dear Philipp, Do you have just lots of zero's, or more zero's than the Poisson distribution can explain? Those are two different things. The example below generates data from a Poisson distribution and has 99% zero's but no zero-inflation. The second example has only 1% zero's but is clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda = 1000) mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model the zero's. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer <killver at gmail.com>:
Dear group - I am currently fitting a Poisson glmer where I have an excess of outcomes that are zero (>95%). I am now debating on how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am not fully sure how interpret the coefficients then, are they more optimizing towards distinguishing zero and non-zero, or also capturing the differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to only those outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between fitting a glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the probability that a data point is zero. The method is elaborated for the OWL data in: https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf I am not fully sure how to interpret the results for the zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as I would do for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am also not quite sure what the common approach for the zformula is here. The OWL elaborations only use zformula=z~1, so no random effect; I would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp,
Do you have just lots of zero's, or more zero's than the Poisson
distribution can explain? Those are two different things. The example
below generates data from a Poisson distribution and has 99% zero's
but no zero-inflation. The second example has only 1% zero's but is
clearly zero-inflated.
set.seed(1)
n <- 1e8
sim <- rpois(n, lambda = 0.01)
mean(sim == 0)
hist(sim)
sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda = 1000)
mean(sim.infl == 0)
hist(sim.infl)
So before looking for zero-inflated models, try to model the zero's.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-23 10:07 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Dear group - I am currently fitting a Poisson glmer where I have
an excess of outcomes that are zero (>95%). I am now debating on
how to proceed and came up with three options:
1.) Just fit a regular glmer to the complete data. I am not fully
sure how interpret the coefficients then, are they more optimizing
towards distinguishing zero and non-zero, or also capturing the
differences in those outcomes that are non-zero?
2.) Leave all zeros out of the data and fit a glmer to only those
outcomes that are non-zero. Then, I would only learn about
differences in the non-zero outcomes though.
3.) Use a zero-inflated Poisson model. My data is quite
large-scale, so I am currently playing around with the EM
implementation of Bolker et al. that alternates between fitting a
glmer with data that are weighted according to their zero
probability, and fitting a logistic regression for the probability
that a data point is zero. The method is elaborated for the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as I would do
for my idea of 2)? And then on top of that interpret the
coefficients for the logistic regression regarding whether
something is in the perfect or imperfect state? I am also not
quite sure what the common approach for the zformula is here. The
OWL elaborations only use zformula=z~1, so no random effect; I
would use the same formula as for the glmer.
I am appreciating some help and pointers.
Thanks!
Philipp
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp,
Do you have just lots of zero's, or more zero's than the Poisson
distribution can explain? Those are two different things. The example
below generates data from a Poisson distribution and has 99% zero's
but no zero-inflation. The second example has only 1% zero's but is
clearly zero-inflated.
set.seed(1)
n <- 1e8
sim <- rpois(n, lambda = 0.01)
mean(sim == 0)
hist(sim)
sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda = 1000)
mean(sim.infl == 0)
hist(sim.infl)
So before looking for zero-inflated models, try to model the zero's.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-23 10:07 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Dear group - I am currently fitting a Poisson glmer where I have
an excess of outcomes that are zero (>95%). I am now debating on
how to proceed and came up with three options:
1.) Just fit a regular glmer to the complete data. I am not fully
sure how interpret the coefficients then, are they more optimizing
towards distinguishing zero and non-zero, or also capturing the
differences in those outcomes that are non-zero?
2.) Leave all zeros out of the data and fit a glmer to only those
outcomes that are non-zero. Then, I would only learn about
differences in the non-zero outcomes though.
3.) Use a zero-inflated Poisson model. My data is quite
large-scale, so I am currently playing around with the EM
implementation of Bolker et al. that alternates between fitting a
glmer with data that are weighted according to their zero
probability, and fitting a logistic regression for the probability
that a data point is zero. The method is elaborated for the OWL
data in:
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as I would do
for my idea of 2)? And then on top of that interpret the
coefficients for the logistic regression regarding whether
something is in the perfect or imperfect state? I am also not
quite sure what the common approach for the zformula is here. The
OWL elaborations only use zformula=z~1, so no random effect; I
would use the same formula as for the glmer.
I am appreciating some help and pointers.
Thanks!
Philipp
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well.
On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp,
1. Fit a Poisson model to the data.
2. Simulate a new response vector for the dataset according to the model.
3. Count the number of zero's in the simulated response vector.
4. Repeat step 2 and 3 a decent number of time and plot a histogram of
the number of zero's in the simulation. If the number of zero's in the
original dataset is larger than those in the simulations, then the
model can't capture all zero's. In such case, first try to update the
model and repeat the procedure. If that fails, look for zero-inflated
models.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Thanks Thierry - That totally makes sense. Is there some way of
formally
checking that, except thinking about the setting and underlying
processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
> Dear Philipp,
>
> Do you have just lots of zero's, or more zero's than the Poisson
> distribution can explain? Those are two different things. The
example
> below generates data from a Poisson distribution and has 99% zero's
> but no zero-inflation. The second example has only 1% zero's but is
> clearly zero-inflated.
>
> set.seed(1)
> n <- 1e8
> sim <- rpois(n, lambda = 0.01)
> mean(sim == 0)
> hist(sim)
>
> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda =
1000)
> mean(sim.infl == 0)
> hist(sim.infl)
>
> So before looking for zero-inflated models, try to model the zero's.
>
> Best regards,
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for
Nature
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a
given body
> of data. ~ John Tukey
>
> 2016-06-23 10:07 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>
> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>
> Dear group - I am currently fitting a Poisson glmer where I have
> an excess of outcomes that are zero (>95%). I am now debating on
> how to proceed and came up with three options:
>
> 1.) Just fit a regular glmer to the complete data. I am not
fully
> sure how interpret the coefficients then, are they more
optimizing
> towards distinguishing zero and non-zero, or also capturing the
> differences in those outcomes that are non-zero?
>
> 2.) Leave all zeros out of the data and fit a glmer to only
those
> outcomes that are non-zero. Then, I would only learn about
> differences in the non-zero outcomes though.
>
> 3.) Use a zero-inflated Poisson model. My data is quite
> large-scale, so I am currently playing around with the EM
> implementation of Bolker et al. that alternates between
fitting a
> glmer with data that are weighted according to their zero
> probability, and fitting a logistic regression for the
probability
> that a data point is zero. The method is elaborated for the OWL
> data in:
>
>
> I am not fully sure how to interpret the results for the
> zero-inflated version though. Would I need to interpret the
> coefficients for the result of the glmer similar to as I
would do
> for my idea of 2)? And then on top of that interpret the
> coefficients for the logistic regression regarding whether
> something is in the perfect or imperfect state? I am also not
> quite sure what the common approach for the zformula is
here. The
> OWL elaborations only use zformula=z~1, so no random effect; I
> would use the same formula as for the glmer.
>
> I am appreciating some help and pointers.
>
> Thanks!
> Philipp
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>> mailing list
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote: Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the Poisson distribution can explain? Those are two different things. The example below generates data from a Poisson distribution and has 99% zero's but no zero-inflation. The second example has only 1% zero's but is clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n, lambda = 1000) mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model the zero's. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer < <killver at gmail.com>
killver at gmail.com
<mailto:killver at gmail.com>>:
Dear group - I am currently fitting a Poisson glmer where I have
an excess of outcomes that are zero (>95%). I am now debating on
how to proceed and came up with three options:
1.) Just fit a regular glmer to the complete data. I am not fully
sure how interpret the coefficients then, are they more optimizing
towards distinguishing zero and non-zero, or also capturing the
differences in those outcomes that are non-zero?
2.) Leave all zeros out of the data and fit a glmer to only those
outcomes that are non-zero. Then, I would only learn about
differences in the non-zero outcomes though.
3.) Use a zero-inflated Poisson model. My data is quite
large-scale, so I am currently playing around with the EM
implementation of Bolker et al. that alternates between fitting a
glmer with data that are weighted according to their zero
probability, and fitting a logistic regression for the probability
that a data point is zero. The method is elaborated for the OWL
data in:
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as I would do
for my idea of 2)? And then on top of that interpret the
coefficients for the logistic regression regarding whether
something is in the perfect or imperfect state? I am also not
quite sure what the common approach for the zformula is here. The
OWL elaborations only use zformula=z~1, so no random effect; I
would use the same formula as for the glmer.
I am appreciating some help and pointers.
Thanks!
Philipp
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%).
On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp,
1. Fit a Poisson model to the data.
2. Simulate a new response vector for the dataset according to
the model.
3. Count the number of zero's in the simulated response vector.
4. Repeat step 2 and 3 a decent number of time and plot a
histogram of the number of zero's in the simulation. If the
number of zero's in the original dataset is larger than those in
the simulations, then the model can't capture all zero's. In such
case, first try to update the model and repeat the procedure. If
that fails, look for zero-inflated models.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for
Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Thanks Thierry - That totally makes sense. Is there some way
of formally
checking that, except thinking about the setting and
underlying processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
> Dear Philipp,
>
> Do you have just lots of zero's, or more zero's than the
Poisson
> distribution can explain? Those are two different things.
The example
> below generates data from a Poisson distribution and has
99% zero's
> but no zero-inflation. The second example has only 1%
zero's but is
> clearly zero-inflated.
>
> set.seed(1)
> n <- 1e8
> sim <- rpois(n, lambda = 0.01)
> mean(sim == 0)
> hist(sim)
>
> sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
> mean(sim.infl == 0)
> hist(sim.infl)
>
> So before looking for zero-inflated models, try to model
the zero's.
>
> Best regards,
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done
may be no
> more than asking him to perform a post-mortem examination:
he may be
> able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an
answer does
> not ensure that a reasonable answer can be extracted from a
given body
> of data. ~ John Tukey
>
> 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>
> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
>
> Dear group - I am currently fitting a Poisson glmer
where I have
> an excess of outcomes that are zero (>95%). I am now
debating on
> how to proceed and came up with three options:
>
> 1.) Just fit a regular glmer to the complete data. I am
not fully
> sure how interpret the coefficients then, are they more
optimizing
> towards distinguishing zero and non-zero, or also
capturing the
> differences in those outcomes that are non-zero?
>
> 2.) Leave all zeros out of the data and fit a glmer to
only those
> outcomes that are non-zero. Then, I would only learn about
> differences in the non-zero outcomes though.
>
> 3.) Use a zero-inflated Poisson model. My data is quite
> large-scale, so I am currently playing around with the EM
> implementation of Bolker et al. that alternates between
fitting a
> glmer with data that are weighted according to their zero
> probability, and fitting a logistic regression for the
probability
> that a data point is zero. The method is elaborated for
the OWL
> data in:
>
>
> I am not fully sure how to interpret the results for the
> zero-inflated version though. Would I need to interpret the
> coefficients for the result of the glmer similar to as
I would do
> for my idea of 2)? And then on top of that interpret the
> coefficients for the logistic regression regarding whether
> something is in the perfect or imperfect state? I am
also not
> quite sure what the common approach for the zformula is
here. The
> OWL elaborations only use zformula=z~1, so no random
effect; I
> would use the same formula as for the glmer.
>
> I am appreciating some help and pointers.
>
> Thanks!
> Philipp
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>> mailing list
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp,
1. Fit a Poisson model to the data.
2. Simulate a new response vector for the dataset according to
the model.
3. Count the number of zero's in the simulated response vector.
4. Repeat step 2 and 3 a decent number of time and plot a
histogram of the number of zero's in the simulation. If the
number of zero's in the original dataset is larger than those in
the simulations, then the model can't capture all zero's. In such
case, first try to update the model and repeat the procedure. If
that fails, look for zero-inflated models.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for
Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com <mailto:killver at gmail.com>>>:
Thanks Thierry - That totally makes sense. Is there some way
of formally
checking that, except thinking about the setting and
underlying processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
I would also comment that glmmTMB is likely to be much faster than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info.
On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB") and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote: Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects?
On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com <mailto:killver at gmail.com>> wrote: Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather use the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp
On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote: Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Probably for now the glmmTMB issues page is best.
When you go there:
- details on installation problems/hiccups would be useful
- a reproducible example for the problem listed below would be useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather use the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote: Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in anaconda with a specific R kernel and had some issues. I am trying to resort that with the anaconda guys though, if I have a tutorial on how to properly setup glmmTMB in anaconda, I will let you know. The install worked fine in my standard R environment.
On 24.06.2016 15:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best.
When you go there:
- details on installation problems/hiccups would be useful
- a reproducible example for the problem listed below would be useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather use the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote: Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/
On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote: Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf <https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf>
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
2 days later
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in anaconda with a specific R kernel and had some issues. I am trying to resort that with the anaconda guys though, if I have a tutorial on how to properly setup glmmTMB in anaconda, I will let you know. The install worked fine in my standard R environment. On 24.06.2016 15:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best.
When you go there:
- details on installation problems/hiccups would be useful
- a reproducible example for the problem listed below would be useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather use the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral Researcher, Population Ecology Research Group Department of Evolutionary Biology & Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/ On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number of zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf < https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________
R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer <killver at gmail.com>:
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in anaconda
with
a specific R kernel and had some issues. I am trying to resort that with the anaconda guys though, if I have a tutorial on how to properly setup glmmTMB in anaconda, I will let you know. The install worked fine in my standard R environment. On 24.06.2016 15:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best.
When you go there:
- details on installation problems/hiccups would be useful
- a reproducible example for the problem listed below would be useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather use the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot of testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral
Researcher,
Population Ecology Research Group Department of Evolutionary
Biology
& Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/ On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com> wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same number
of
zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of zero's in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way of formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
<
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________
R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The variance is: Conditional model: Groups Name Variance Std.Dev. obs (Intercept) 8.991e+01 9.4823139 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx <thierry.onkelinx at inbo.be>:
Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer <killver at gmail.com>:
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in anaconda
with
a specific R kernel and had some issues. I am trying to resort that with the anaconda guys though, if I have a tutorial on how to properly setup glmmTMB in anaconda, I will let you know. The install worked fine in my standard R environment. On 24.06.2016 15:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best. When you go there: - details on installation problems/hiccups would be useful - a reproducible example for the problem listed below would be
useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN function evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system is exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather use the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as I need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I will check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot
of
testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral
Researcher,
Population Ecology Research Group Department of Evolutionary
Biology
& Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/ On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com>
wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same
number of
zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination:
he
may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for
an
answer does not ensure that a reasonable answer can be
extracted
from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new response vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of
zero's
in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died
of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way
of
formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
<
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________
R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Here is the fitted vs. residual plot for the observation-level poisson model where the observation level has been removed as taken from: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html So basically the prediction is always close to zero. Note that this is just on a very small sample (1000 data points). If I fit a nbinom2 to this smalle sample, I get predictions that are always around ~20 (but never zero). Both plots are attached. What I am wondering is whether I can do inference on a fixed parameter in my model which is my main task of this study. The effect is similar in the different models and in general I am only itnerested in whether it is positive/negative and "significant" which it is. However, as can be seen, the prediction looks not too good here. 2016-06-27 15:18 GMT+02:00 Philipp Singer <killver at gmail.com>:
The variance is: Conditional model: Groups Name Variance Std.Dev. obs (Intercept) 8.991e+01 9.4823139 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx <thierry.onkelinx at inbo.be>:
Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer <killver at gmail.com>:
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in anaconda
with
a specific R kernel and had some issues. I am trying to resort that
with
the anaconda guys though, if I have a tutorial on how to properly setup glmmTMB in anaconda, I will let you know. The install worked fine in my standard R environment. On 24.06.2016 15:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best. When you go there: - details on installation problems/hiccups would be useful - a reproducible example for the problem listed below would be
useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN
function
evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system
is
exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather
use
the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion as
I
need to specify the dispformula. What is the difference here between just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in
vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM. On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my data. Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I
will
check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to recommended more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a lot
of
testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral
Researcher,
Population Ecology Research Group Department of Evolutionary
Biology
& Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/ On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com>
wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same
number of
zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may
be
no more than asking him to perform a post-mortem examination:
he
may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for
an
answer does not ensure that a reasonable answer can be
extracted
from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new
response
vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of
zero's
in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In such case, first try to update the model and repeat the procedure. If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died
of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not
data.
~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some way
of
formally checking that, except thinking about the setting and underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01) mean(sim == 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek
/
Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
<
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________
R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-------------- next part -------------- A non-text attachment was scrubbed... Name: fitted.png Type: image/png Size: 26021 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20160627/a817d3eb/attachment-0002.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: nbinom_fitted.png Type: image/png Size: 26194 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20160627/a817d3eb/attachment-0003.png>
Dear Philipp, You've been bitten by observation level random effects. I've put together a document about it on http://rpubs.com/INBOstats/OLRE. Bottomline you're OKish when the standard devation of the OLRE smaller than 1. You're in trouble when it's above 3. In between you need to check the model carefully. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com>:
Here is the fitted vs. residual plot for the observation-level poisson model where the observation level has been removed as taken from: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html So basically the prediction is always close to zero. Note that this is just on a very small sample (1000 data points). If I fit a nbinom2 to this smalle sample, I get predictions that are always around ~20 (but never zero). Both plots are attached. What I am wondering is whether I can do inference on a fixed parameter in my model which is my main task of this study. The effect is similar in the different models and in general I am only itnerested in whether it is positive/negative and "significant" which it is. However, as can be seen, the prediction looks not too good here. 2016-06-27 15:18 GMT+02:00 Philipp Singer <killver at gmail.com>:
The variance is: Conditional model: Groups Name Variance Std.Dev. obs (Intercept) 8.991e+01 9.4823139 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx <thierry.onkelinx at inbo.be>:
Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer <killver at gmail.com>:
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in anaconda
with
a specific R kernel and had some issues. I am trying to resort that
with
the anaconda guys though, if I have a tutorial on how to properly
setup
glmmTMB in anaconda, I will let you know. The install worked fine in
my
standard R environment. On 24.06.2016 15:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best. When you go there: - details on installation problems/hiccups would be useful - a reproducible example for the problem listed below would be
useful
- dispformula is for allowing dispersion/residual variance to vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN
function
evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv: system
is
exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather
use
the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing, but works now on my 3.3 R setup. One question I have is regarding the specification of dispersion
as I
need to specify the dispformula. What is the difference here
between
just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in
vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM. On 23Jun 2016, at 10:44, Philipp Singer <killver at gmail.com>
wrote:
Did try glmmADMB but unfortunately it is way too slow for my
data.
Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I
will
check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to
recommended
more often. https://github.com/glmmTMB/glmmTMB http://glmmadmb.r-forge.r-project.org glmmTMB is still in the development phase, but we?ve done a
lot of
testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral
Researcher,
Population Ecology Research Group Department of Evolutionary
Biology
& Environmental Studies, University of Z?rich http://www.popecol.org/team/mollie-brooks/ On 23Jun 2016, at 8:22, Philipp Singer <killver at gmail.com>
wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same
number of
zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model zero-inflation. Those are two different things. I've put some information together in http://rpubs.com/INBOstats/zeroinflation ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may
be
no more than asking him to perform a post-mortem
examination: he
may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire
for an
answer does not ensure that a reasonable answer can be
extracted
from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new
response
vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4. Repeat step 2 and 3 a decent number of time and plot a histogram of the number of zero's in the simulation. If the number of
zero's
in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In
such
case, first try to update the model and repeat the
procedure.
If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek
/
Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died
of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not
data.
~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable
answer
can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <
killver at gmail.com
<mailto:killver at gmail.com <mailto:killver at gmail.com>>>: Thanks Thierry - That totally makes sense. Is there some
way of
formally checking that, except thinking about the setting
and
underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01)
mean(sim
== 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en
bosonderzoek /
Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team
Biometrics
& Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>
<mailto:killver at gmail.com <mailto:killver at gmail.com> <mailto:killver at gmail.com <mailto:killver at gmail.com>>>>: Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
<
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>
<mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org>> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________
R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Well, as posted beforehand the std dev is 9.5 ... so does not seem too good then :/ Any other idea?
On 27.06.2016 17:31, Thierry Onkelinx wrote:
Dear Philipp, You've been bitten by observation level random effects. I've put together a document about it on http://rpubs.com/INBOstats/OLRE. Bottomline you're OKish when the standard devation of the OLRE smaller than 1. You're in trouble when it's above 3. In between you need to check the model carefully. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com>>: Here is the fitted vs. residual plot for the observation-level poisson model where the observation level has been removed as taken from: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html So basically the prediction is always close to zero. Note that this is just on a very small sample (1000 data points). If I fit a nbinom2 to this smalle sample, I get predictions that are always around ~20 (but never zero). Both plots are attached. What I am wondering is whether I can do inference on a fixed parameter in my model which is my main task of this study. The effect is similar in the different models and in general I am only itnerested in whether it is positive/negative and "significant" which it is. However, as can be seen, the prediction looks not too good here. 2016-06-27 15:18 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com>>: The variance is: Conditional model: Groups Name Variance Std.Dev. obs (Intercept) 8.991e+01 9.4823139 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx <thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be>>: Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com>>: I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer <killver at gmail.com <mailto:killver at gmail.com>>:
> Thanks - I started an issue there to answer some of
my questions.
>
> Regarding the installation: I was trying to somehow
do it in anaconda with
> a specific R kernel and had some issues. I am trying
to resort that with
> the anaconda guys though, if I have a tutorial on
how to properly setup
> glmmTMB in anaconda, I will let you know. The
install worked fine in my
> standard R environment.
>
>
> On 24.06.2016 15 <tel:24.06.2016%2015>:40, Ben
Bolker wrote:
>
>> Probably for now the glmmTMB issues page is best.
>>
>> When you go there:
>>
>> - details on installation problems/hiccups would
be useful
>> - a reproducible example for the problem listed
below would be useful
>> - dispformula is for allowing
dispersion/residual variance to vary
>> with covariates (i.e., modeling heteroscedasticity)
>>
>> cheers
>> Ben Bolker
>>
>>
>> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>
>>> Update, I tried it like that, but receive an error
message.
>>>
>>> Warning message:
>>> In nlminb(start = par, objective = fn, gradient =
gr): NA/NaN function
>>> evaluation
>>>
>>> Error in solve.default(hessian.fixed): Lapack
routine dgesv: system is
>>> exactly singular: U[3,3] = 0
>>> Traceback:
>>>
>>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>> . data = data, family = "poisson", dispformula
= ~1 + x)
>>> 2. sdreport(obj)
>>> 3. solve(hessian.fixed)
>>> 4. solve(hessian.fixed)
>>> 5. solve.default(hessian.fixed)
>>>
>>> Any ideas on that?
>>>
>>> BTW: Is it fine to post glmmTMB questions here, or
should I rather use
>>> the github issue page, or is there maybe a
dedicated mailing list?
>>>
>>> Thanks,
>>> Philipp
>>>
>>> On 24.06.2016 14:35, Philipp Singer wrote:
>>>
>>>> It indeed seems to run quite fast; had some
trouble installing, but
>>>> works now on my 3.3 R setup.
>>>>
>>>> One question I have is regarding the
specification of dispersion as I
>>>> need to specify the dispformula. What is the
difference here between
>>>> just specifying fixed effects vs. also the random
effects?
>>>>
>>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>>
>>>>> glmmTMB does crossed RE. Ben did some timings in
vignette("glmmTMB")
>>>>> and it was 2.3 times faster than glmer for one
simple GLMM.
>>>>>
>>>>>
>>>>> On 23Jun 2016, at 10:44, Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>> wrote:
>>>>>>
>>>>>> Did try glmmADMB but unfortunately it is way
too slow for my data.
>>>>>>
>>>>>> Did not know about glmmTMB, will try it out.
Does it work with
>>>>>> crossed random effects and how does it scale
with more data? I will
>>>>>> check the docu and try it though. Thanks for
the info.
>>>>>>
>>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>>
>>>>>>> I would also comment that glmmTMB is likely
to be much faster
>>>>>>> than the
>>>>>>> lme4-based EM approach ...
>>>>>>>
>>>>>>> cheers
>>>>>>> Ben B.
>>>>>>>
>>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>>>
>>>>>>>> Hi Philipp,
>>>>>>>>
>>>>>>>> You could also try fitting the model with and
without ZI using
>>>>>>>> either
>>>>>>>> glmmADMB or glmmTMB. Then compare the AICs. I
believe model
>>>>>>>> selection
>>>>>>>> is useful for this, but I could be missing
something since the
>>>>>>>> simulation procedure that Thierry described
seems to recommended
>>>>>>>> more
>>>>>>>> often.
>>>>>>>>
>>>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>>>>
>>>>>>>> glmmTMB is still in the development phase,
but we?ve done a lot of
>>>>>>>> testing.
>>>>>>>>
>>>>>>>> cheers, Mollie
>>>>>>>>
>>>>>>>> ------------------------ Mollie Brooks, PhD
Postdoctoral Researcher,
>>>>>>>> Population Ecology Research Group Department
of Evolutionary Biology
>>>>>>>> & Environmental Studies, University of Z?rich
>>>>>>>> http://www.popecol.org/team/mollie-brooks/
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks, great information, that is really
helpful.
>>>>>>>>>
>>>>>>>>> I agree that those are different things,
however when using a
>>>>>>>>> random effect for overdispersion, I can
simulate the same number of
>>>>>>>>> zero outcomes (~95%).
>>>>>>>>>
>>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx wrote:
>>>>>>>>>
>>>>>>>>>> Be careful when using overdispersion to
model zero-inflation.
>>>>>>>>>> Those are two different things.
>>>>>>>>>>
>>>>>>>>>> I've put some information together in
>>>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>>>>>
>>>>>>>>>> ir. Thierry Onkelinx Instituut voor natuur-
en bosonderzoek /
>>>>>>>>>> Research Institute for Nature and Forest
team Biometrie &
>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality
Assurance
>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>
>>>>>>>>>> To call in the statistician after the
experiment is done may be
>>>>>>>>>> no more than asking him to perform a
post-mortem examination: he
>>>>>>>>>> may be able to say what the experiment died
of. ~ Sir Ronald
>>>>>>>>>> Aylmer Fisher The plural of anecdote is not
data. ~ Roger
>>>>>>>>>> Brinner The combination of some data and an
aching desire for an
>>>>>>>>>> answer does not ensure that a reasonable
answer can be extracted
>>>>>>>>>> from a given body of data. ~ John Tukey
>>>>>>>>>>
>>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com> <mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>:
>>>>>>>>>>
>>>>>>>>>> Thanks! Actually, accounting for
overdispersion is super
>>>>>>>>>> important as it seems, then the zeros can
be captured well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>
>>>>>>>>>>> 1. Fit a Poisson model to the data. 2.
Simulate a new response
>>>>>>>>>>> vector for the dataset according to the
model. 3. Count the
>>>>>>>>>>> number of zero's in the simulated response
vector. 4. Repeat
>>>>>>>>>>> step 2 and 3 a decent number of time and
plot a histogram of
>>>>>>>>>>> the number of zero's in the simulation. If
the number of zero's
>>>>>>>>>>> in the original dataset is larger than
those in the
>>>>>>>>>>> simulations, then the model can't capture
all zero's. In such
>>>>>>>>>>> case, first try to update the model and
repeat the procedure.
>>>>>>>>>>> If that fails, look for zero-inflated models.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>>> Research Institute for Nature and Forest
team Biometrie &
>>>>>>>>>>> Kwaliteitszorg / team Biometrics & Quality
Assurance
>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>
>>>>>>>>>>> To call in the statistician after the
experiment is done may
>>>>>>>>>>> be no more than asking him to perform a
post-mortem
>>>>>>>>>>> examination: he may be able to say what
the experiment died of.
>>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural of
anecdote is not data.
>>>>>>>>>>> ~ Roger Brinner The combination of some
data and an aching
>>>>>>>>>>> desire for an answer does not ensure that
a reasonable answer
>>>>>>>>>>> can be extracted from a given body of
data. ~ John Tukey
>>>>>>>>>>>
>>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com> <mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>:
>>>>>>>>>>>
>>>>>>>>>>> Thanks Thierry - That totally makes sense.
Is there some way of
>>>>>>>>>>> formally checking that, except thinking
about the setting and
>>>>>>>>>>> underlying processes?
>>>>>>>>>>>
>>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have just lots of zero's, or more
zero's than the
>>>>>>>>>>>>
>>>>>>>>>>> Poisson
>>>>>>>>>>>
>>>>>>>>>>>> distribution can explain? Those are two
different things.
>>>>>>>>>>>>
>>>>>>>>>>> The example
>>>>>>>>>>>
>>>>>>>>>>>> below generates data from a Poisson
distribution and has
>>>>>>>>>>>>
>>>>>>>>>>> 99% zero's
>>>>>>>>>>>
>>>>>>>>>>>> but no zero-inflation. The second example
has only 1%
>>>>>>>>>>>>
>>>>>>>>>>> zero's but is
>>>>>>>>>>>
>>>>>>>>>>>> clearly zero-inflated.
>>>>>>>>>>>>
>>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n,
lambda = 0.01) mean(sim
>>>>>>>>>>>> == 0) hist(sim)
>>>>>>>>>>>>
>>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob =
0.99) * rpois(n,
>>>>>>>>>>>>
>>>>>>>>>>> lambda = 1000)
>>>>>>>>>>>
>>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>>>>>
>>>>>>>>>>>> So before looking for zero-inflated
models, try to model
>>>>>>>>>>>>
>>>>>>>>>>> the zero's.
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>>>> Research Institute
>>>>>>>>>>>>
>>>>>>>>>>> for Nature
>>>>>>>>>>>
>>>>>>>>>>>> and Forest team Biometrie &
Kwaliteitszorg / team Biometrics
>>>>>>>>>>>> & Quality
>>>>>>>>>>>>
>>>>>>>>>>> Assurance
>>>>>>>>>>>
>>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>>
>>>>>>>>>>>> To call in the statistician after the
experiment is done
>>>>>>>>>>>>
>>>>>>>>>>> may be no
>>>>>>>>>>>
>>>>>>>>>>>> more than asking him to perform a
post-mortem examination:
>>>>>>>>>>>>
>>>>>>>>>>> he may be
>>>>>>>>>>>
>>>>>>>>>>>> able to say what the experiment died of.
~ Sir Ronald
>>>>>>>>>>>>
>>>>>>>>>>> Aylmer Fisher
>>>>>>>>>>>
>>>>>>>>>>>> The plural of anecdote is not data. ~
Roger Brinner The
>>>>>>>>>>>> combination of some data and an aching
desire for an
>>>>>>>>>>>>
>>>>>>>>>>> answer does
>>>>>>>>>>>
>>>>>>>>>>>> not ensure that a reasonable answer can
be extracted from a
>>>>>>>>>>>>
>>>>>>>>>>> given body
>>>>>>>>>>>
>>>>>>>>>>>> of data. ~ John Tukey
>>>>>>>>>>>>
>>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp Singer
>>>>>>>>>>>>
>>>>>>>>>>> <killver at gmail.com
<mailto:killver at gmail.com> <mailto:killver at gmail.com
<mailto:killver at gmail.com>>
>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com> <mailto:killver at gmail.com
<mailto:killver at gmail.com>>>
>>>>>>>>>>>
>>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com> <mailto:killver at gmail.com
<mailto:killver at gmail.com>>
>>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com> <mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>>:
>>>>>>>>>>>>
>>>>>>>>>>>> Dear group - I am currently fitting a
Poisson glmer
>>>>>>>>>>>>
>>>>>>>>>>> where I have
>>>>>>>>>>>
>>>>>>>>>>>> an excess of outcomes that are zero
(>95%). I am now
>>>>>>>>>>>>
>>>>>>>>>>> debating on
>>>>>>>>>>>
>>>>>>>>>>>> how to proceed and came up with three
options:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.) Just fit a regular glmer to the
complete data. I am
>>>>>>>>>>>>
>>>>>>>>>>> not fully
>>>>>>>>>>>
>>>>>>>>>>>> sure how interpret the coefficients then,
are they more
>>>>>>>>>>>>
>>>>>>>>>>> optimizing
>>>>>>>>>>>
>>>>>>>>>>>> towards distinguishing zero and non-zero,
or also
>>>>>>>>>>>>
>>>>>>>>>>> capturing the
>>>>>>>>>>>
>>>>>>>>>>>> differences in those outcomes that are
non-zero?
>>>>>>>>>>>>
>>>>>>>>>>>> 2.) Leave all zeros out of the data and
fit a glmer to
>>>>>>>>>>>>
>>>>>>>>>>> only those
>>>>>>>>>>>
>>>>>>>>>>>> outcomes that are non-zero. Then, I would
only learn about
>>>>>>>>>>>> differences in the non-zero outcomes though.
>>>>>>>>>>>>
>>>>>>>>>>>> 3.) Use a zero-inflated Poisson model. My
data is quite
>>>>>>>>>>>> large-scale, so I am currently playing
around with the EM
>>>>>>>>>>>> implementation of Bolker et al. that
alternates between
>>>>>>>>>>>>
>>>>>>>>>>> fitting a
>>>>>>>>>>>
>>>>>>>>>>>> glmer with data that are weighted
according to their zero
>>>>>>>>>>>> probability, and fitting a logistic
regression for the
>>>>>>>>>>>>
>>>>>>>>>>> probability
>>>>>>>>>>>
>>>>>>>>>>>> that a data point is zero. The method is
elaborated for
>>>>>>>>>>>>
>>>>>>>>>>> the OWL
>>>>>>>>>>>
>>>>>>>>>>>> data in:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>> I am not fully sure how to interpret the
results for the
>>>>>>>
>>>>>>>> zero-inflated version though. Would I need to
interpret the
>>>>>>>>>>>> coefficients for the result of the glmer
similar to as
>>>>>>>>>>>>
>>>>>>>>>>> I would do
>>>>>>>>>>>
>>>>>>>>>>>> for my idea of 2)? And then on top of
that interpret the
>>>>>>>>>>>> coefficients for the logistic regression
regarding whether
>>>>>>>>>>>> something is in the perfect or imperfect
state? I am
>>>>>>>>>>>>
>>>>>>>>>>> also not
>>>>>>>>>>>
>>>>>>>>>>>> quite sure what the common approach for
the zformula is
>>>>>>>>>>>>
>>>>>>>>>>> here. The
>>>>>>>>>>>
>>>>>>>>>>>> OWL elaborations only use zformula=z~1,
so no random
>>>>>>>>>>>>
>>>>>>>>>>> effect; I
>>>>>>>>>>>
>>>>>>>>>>>> would use the same formula as for the glmer.
>>>>>>>>>>>>
>>>>>>>>>>>> I am appreciating some help and pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks! Philipp
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>
>>>>>>>>>>>
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>> mailing list
>>>>>>>>>>>
>>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>> mailing list
>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>> mailing list
>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
If there is overdispersion, then try a negative binomial model or a zero-inflated negative binomial model. If not try a zero-inflated Poisson. Adding relevant covariates can reduce overdispersion. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com>:
Well, as posted beforehand the std dev is 9.5 ... so does not seem too good then :/ Any other idea? On 27.06.2016 17:31, Thierry Onkelinx wrote: Dear Philipp, You've been bitten by observation level random effects. I've put together a document about it on <http://rpubs.com/INBOstats/OLRE> http://rpubs.com/INBOstats/OLRE. Bottomline you're OKish when the standard devation of the OLRE smaller than 1. You're in trouble when it's above 3. In between you need to check the model carefully. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com>:
Here is the fitted vs. residual plot for the observation-level poisson model where the observation level has been removed as taken from: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html So basically the prediction is always close to zero. Note that this is just on a very small sample (1000 data points). If I fit a nbinom2 to this smalle sample, I get predictions that are always around ~20 (but never zero). Both plots are attached. What I am wondering is whether I can do inference on a fixed parameter in my model which is my main task of this study. The effect is similar in the different models and in general I am only itnerested in whether it is positive/negative and "significant" which it is. However, as can be seen, the prediction looks not too good here. 2016-06-27 15:18 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
The variance is: Conditional model: Groups Name Variance Std.Dev. obs (Intercept) 8.991e+01 9.4823139 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx < <thierry.onkelinx at inbo.be> thierry.onkelinx at inbo.be>:
Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in
anaconda with
a specific R kernel and had some issues. I am trying to resort that
with
the anaconda guys though, if I have a tutorial on how to properly
setup
glmmTMB in anaconda, I will let you know. The install worked fine in
my
standard R environment. On 24.06.2016 15 <24.06.2016%2015>:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best. When you go there: - details on installation problems/hiccups would be useful - a reproducible example for the problem listed below would be
useful
- dispformula is for allowing dispersion/residual variance to
vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN
function
evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv:
system is
exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I rather
use
the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing,
but
works now on my 3.3 R setup. One question I have is regarding the specification of dispersion
as I
need to specify the dispformula. What is the difference here
between
just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in
vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM. On 23Jun 2016, at 10:44, Philipp Singer < <killver at gmail.com>
killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my
data.
Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I
will
check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much
faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI using either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since the simulation procedure that Thierry described seems to
recommended
more often. <https://github.com/glmmTMB/glmmTMB>
glmmTMB is still in the development phase, but we?ve done a
lot of
testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral
Researcher,
Population Ecology Research Group Department of Evolutionary
Biology
& Environmental Studies, University of Z?rich <http://www.popecol.org/team/mollie-brooks/>
On 23Jun 2016, at 8:22, Philipp Singer < <killver at gmail.com>
killver at gmail.com> wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using a random effect for overdispersion, I can simulate the same
number of
zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model
zero-inflation.
Those are two different things. I've put some information together in <http://rpubs.com/INBOstats/zeroinflation>
ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek
/
Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be
no more than asking him to perform a post-mortem
examination: he
may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire
for an
answer does not ensure that a reasonable answer can be
extracted
from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <
<killver at gmail.com>killver at gmail.com
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>:
Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new
response
vector for the dataset according to the model. 3. Count the number of zero's in the simulated response vector. 4.
Repeat
step 2 and 3 a decent number of time and plot a histogram
of
the number of zero's in the simulation. If the number of
zero's
in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In
such
case, first try to update the model and repeat the
procedure.
If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en
bosonderzoek /
Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may
be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment
died of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not
data.
~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable
answer
can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <
<killver at gmail.com>killver at gmail.com
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>:
Thanks Thierry - That totally makes sense. Is there some
way of
formally checking that, except thinking about the setting
and
underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01)
mean(sim
== 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en
bosonderzoek /
Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team
Biometrics
& Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted from
a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
< <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>>:
Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn about differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
<https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
<
<https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding whether something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models < <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org>> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models < <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models < <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org>R-sig-mixed-models at r-project.org mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org>R-sig-mixed-models at r-project.org mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
NBinom was not really successfull unitl now, but will try to tune. Thanks for your help! One point I forgot to mention was that apart from my excess of zeros, the lowest data outcome is 10, so there is a gap between zeri and 10. Could that be somehow a problem?
On 27.06.2016 21:59, Thierry Onkelinx wrote:
If there is overdispersion, then try a negative binomial model or a
zero-inflated negative binomial model. If not try a zero-inflated
Poisson. Adding relevant covariates can reduce overdispersion.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Well, as posted beforehand the std dev is 9.5 ... so does not seem
too good then :/
Any other idea?
On 27.06.2016 17:31, Thierry Onkelinx wrote:
Dear Philipp,
You've been bitten by observation level random effects. I've put
together a document about it on http://rpubs.com/INBOstats/OLRE.
Bottomline you're OKish when the standard devation of the OLRE
smaller than 1. You're in trouble when it's above 3. In between
you need to check the model carefully.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for
Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Here is the fitted vs. residual plot for the
observation-level poisson model where the observation level
has been removed as taken from:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html
So basically the prediction is always close to zero.
Note that this is just on a very small sample (1000 data points).
If I fit a nbinom2 to this smalle sample, I get predictions
that are always around ~20 (but never zero). Both plots are
attached.
What I am wondering is whether I can do inference on a fixed
parameter in my model which is my main task of this study.
The effect is similar in the different models and in general
I am only itnerested in whether it is positive/negative and
"significant" which it is. However, as can be seen, the
prediction looks not too good here.
2016-06-27 15:18 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
The variance is:
Conditional model:
Groups Name Variance Std.Dev.
obs (Intercept) 8.991e+01 9.4823139
2016-06-27 15:06 GMT+02:00 Thierry Onkelinx
<thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be>>:
Dear Philipp,
How strong is the variance of the observation level
random effect? I would trust a model with large OLRE
variance.
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research
Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics &
Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is
done may be no more than asking him to perform a
post-mortem examination: he may be able to say what
the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer
can be extracted from a given body of data. ~ John Tukey
2016-06-27 14:59 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>>:
I have now played around more with the data an
the models both using lme4
and glmmTMB.
I can report the following:
Modeling the data with a zero-inflated Poisson
improves the model
significantly. However, when calling predict and
simulating rpoissons, I
end up with nearly no values that are zero (in
the original data there are
96% zero).
When I model the data with overdisperion by
including an observation-level
random effect, I can also improve the model (not
surprisingly due to the
random effect). When I predict outcomes by
ignoring the observation-level
random effect (in lme4), I receive bad prediction
if I compare it to the
original data. While many zeros can be captured
(of course), the positive
outcomes can not be captured well.
Combining zero inflation and overdispersion
further improves the model, but
I can only do that with glmmTMB and then have
troubles doing predictions
ignoring the observation-level random effect.
Another side question:
In lme4, when I do:
m = glm(x~1,family="poisson")
rpois(n=len(data),lambda=predict(m,
type='response',re.form=NA)
vs.
simulate(1,m,re.form=NA)
I receive different outcomes? Do I understand
these function wrongly?
Would appreciate some more help/pointers!
Thanks,
Philipp
2016-06-24 15:52 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>>:
> Thanks - I started an issue there to answer
some of my questions.
>
> Regarding the installation: I was trying to
somehow do it in anaconda with
> a specific R kernel and had some issues. I am
trying to resort that with
> the anaconda guys though, if I have a tutorial
on how to properly setup
> glmmTMB in anaconda, I will let you know. The
install worked fine in my
> standard R environment.
>
>
> On 24.06.2016 15 <tel:24.06.2016%2015>:40, Ben
Bolker wrote:
>
>> Probably for now the glmmTMB issues page is best.
>>
>> When you go there:
>>
>> - details on installation problems/hiccups
would be useful
>> - a reproducible example for the problem
listed below would be useful
>> - dispformula is for allowing
dispersion/residual variance to vary
>> with covariates (i.e., modeling
heteroscedasticity)
>>
>> cheers
>> Ben Bolker
>>
>>
>> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>
>>> Update, I tried it like that, but receive an
error message.
>>>
>>> Warning message:
>>> In nlminb(start = par, objective = fn,
gradient = gr): NA/NaN function
>>> evaluation
>>>
>>> Error in solve.default(hessian.fixed): Lapack
routine dgesv: system is
>>> exactly singular: U[3,3] = 0
>>> Traceback:
>>>
>>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>> . data = data, family = "poisson",
dispformula = ~1 + x)
>>> 2. sdreport(obj)
>>> 3. solve(hessian.fixed)
>>> 4. solve(hessian.fixed)
>>> 5. solve.default(hessian.fixed)
>>>
>>> Any ideas on that?
>>>
>>> BTW: Is it fine to post glmmTMB questions
here, or should I rather use
>>> the github issue page, or is there maybe a
dedicated mailing list?
>>>
>>> Thanks,
>>> Philipp
>>>
>>> On 24.06.2016 14:35, Philipp Singer wrote:
>>>
>>>> It indeed seems to run quite fast; had some
trouble installing, but
>>>> works now on my 3.3 R setup.
>>>>
>>>> One question I have is regarding the
specification of dispersion as I
>>>> need to specify the dispformula. What is the
difference here between
>>>> just specifying fixed effects vs. also the
random effects?
>>>>
>>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>>
>>>>> glmmTMB does crossed RE. Ben did some
timings in vignette("glmmTMB")
>>>>> and it was 2.3 times faster than glmer for
one simple GLMM.
>>>>>
>>>>>
>>>>> On 23Jun 2016, at 10:44, Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>> wrote:
>>>>>>
>>>>>> Did try glmmADMB but unfortunately it is
way too slow for my data.
>>>>>>
>>>>>> Did not know about glmmTMB, will try it
out. Does it work with
>>>>>> crossed random effects and how does it
scale with more data? I will
>>>>>> check the docu and try it though. Thanks
for the info.
>>>>>>
>>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>>
>>>>>>> I would also comment that glmmTMB is
likely to be much faster
>>>>>>> than the
>>>>>>> lme4-based EM approach ...
>>>>>>>
>>>>>>> cheers
>>>>>>> Ben B.
>>>>>>>
>>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>>>
>>>>>>>> Hi Philipp,
>>>>>>>>
>>>>>>>> You could also try fitting the model
with and without ZI using
>>>>>>>> either
>>>>>>>> glmmADMB or glmmTMB. Then compare the
AICs. I believe model
>>>>>>>> selection
>>>>>>>> is useful for this, but I could be
missing something since the
>>>>>>>> simulation procedure that Thierry
described seems to recommended
>>>>>>>> more
>>>>>>>> often.
>>>>>>>>
>>>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>>>>
>>>>>>>> glmmTMB is still in the development
phase, but we?ve done a lot of
>>>>>>>> testing.
>>>>>>>>
>>>>>>>> cheers, Mollie
>>>>>>>>
>>>>>>>> ------------------------ Mollie Brooks,
PhD Postdoctoral Researcher,
>>>>>>>> Population Ecology Research Group
Department of Evolutionary Biology
>>>>>>>> & Environmental Studies, University of
Z?rich
>>>>>>>> http://www.popecol.org/team/mollie-brooks/ >>>>>>>> >>>>>>>> >>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks, great information, that is
really helpful.
>>>>>>>>>
>>>>>>>>> I agree that those are different
things, however when using a
>>>>>>>>> random effect for overdispersion, I can
simulate the same number of
>>>>>>>>> zero outcomes (~95%).
>>>>>>>>>
>>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx
wrote:
>>>>>>>>>
>>>>>>>>>> Be careful when using overdispersion
to model zero-inflation.
>>>>>>>>>> Those are two different things.
>>>>>>>>>>
>>>>>>>>>> I've put some information together in
>>>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>>>>>
>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>> Research Institute for Nature and
Forest team Biometrie &
>>>>>>>>>> Kwaliteitszorg / team Biometrics &
Quality Assurance
>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>
>>>>>>>>>> To call in the statistician after the
experiment is done may be
>>>>>>>>>> no more than asking him to perform a
post-mortem examination: he
>>>>>>>>>> may be able to say what the experiment
died of. ~ Sir Ronald
>>>>>>>>>> Aylmer Fisher The plural of anecdote
is not data. ~ Roger
>>>>>>>>>> Brinner The combination of some data
and an aching desire for an
>>>>>>>>>> answer does not ensure that a
reasonable answer can be extracted
>>>>>>>>>> from a given body of data. ~ John Tukey
>>>>>>>>>>
>>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp
Singer <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>:
>>>>>>>>>>
>>>>>>>>>> Thanks! Actually, accounting for
overdispersion is super
>>>>>>>>>> important as it seems, then the zeros
can be captured well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx
wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>
>>>>>>>>>>> 1. Fit a Poisson model to the data.
2. Simulate a new response
>>>>>>>>>>> vector for the dataset according to
the model. 3. Count the
>>>>>>>>>>> number of zero's in the simulated
response vector. 4. Repeat
>>>>>>>>>>> step 2 and 3 a decent number of time
and plot a histogram of
>>>>>>>>>>> the number of zero's in the
simulation. If the number of zero's
>>>>>>>>>>> in the original dataset is larger
than those in the
>>>>>>>>>>> simulations, then the model can't
capture all zero's. In such
>>>>>>>>>>> case, first try to update the model
and repeat the procedure.
>>>>>>>>>>> If that fails, look for zero-inflated
models.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>>> Research Institute for Nature and
Forest team Biometrie &
>>>>>>>>>>> Kwaliteitszorg / team Biometrics &
Quality Assurance
>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>
>>>>>>>>>>> To call in the statistician after the
experiment is done may
>>>>>>>>>>> be no more than asking him to perform
a post-mortem
>>>>>>>>>>> examination: he may be able to say
what the experiment died of.
>>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural
of anecdote is not data.
>>>>>>>>>>> ~ Roger Brinner The combination of
some data and an aching
>>>>>>>>>>> desire for an answer does not ensure
that a reasonable answer
>>>>>>>>>>> can be extracted from a given body of
data. ~ John Tukey
>>>>>>>>>>>
>>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp
Singer <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>:
>>>>>>>>>>>
>>>>>>>>>>> Thanks Thierry - That totally makes
sense. Is there some way of
>>>>>>>>>>> formally checking that, except
thinking about the setting and
>>>>>>>>>>> underlying processes?
>>>>>>>>>>>
>>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have just lots of zero's, or
more zero's than the
>>>>>>>>>>>>
>>>>>>>>>>> Poisson
>>>>>>>>>>>
>>>>>>>>>>>> distribution can explain? Those are
two different things.
>>>>>>>>>>>>
>>>>>>>>>>> The example
>>>>>>>>>>>
>>>>>>>>>>>> below generates data from a Poisson
distribution and has
>>>>>>>>>>>>
>>>>>>>>>>> 99% zero's
>>>>>>>>>>>
>>>>>>>>>>>> but no zero-inflation. The second
example has only 1%
>>>>>>>>>>>>
>>>>>>>>>>> zero's but is
>>>>>>>>>>>
>>>>>>>>>>>> clearly zero-inflated.
>>>>>>>>>>>>
>>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n,
lambda = 0.01) mean(sim
>>>>>>>>>>>> == 0) hist(sim)
>>>>>>>>>>>>
>>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob
= 0.99) * rpois(n,
>>>>>>>>>>>>
>>>>>>>>>>> lambda = 1000)
>>>>>>>>>>>
>>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>>>>>
>>>>>>>>>>>> So before looking for zero-inflated
models, try to model
>>>>>>>>>>>>
>>>>>>>>>>> the zero's.
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>>>> Research Institute
>>>>>>>>>>>>
>>>>>>>>>>> for Nature
>>>>>>>>>>>
>>>>>>>>>>>> and Forest team Biometrie &
Kwaliteitszorg / team Biometrics
>>>>>>>>>>>> & Quality
>>>>>>>>>>>>
>>>>>>>>>>> Assurance
>>>>>>>>>>>
>>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>>
>>>>>>>>>>>> To call in the statistician after
the experiment is done
>>>>>>>>>>>>
>>>>>>>>>>> may be no
>>>>>>>>>>>
>>>>>>>>>>>> more than asking him to perform a
post-mortem examination:
>>>>>>>>>>>>
>>>>>>>>>>> he may be
>>>>>>>>>>>
>>>>>>>>>>>> able to say what the experiment died
of. ~ Sir Ronald
>>>>>>>>>>>>
>>>>>>>>>>> Aylmer Fisher
>>>>>>>>>>>
>>>>>>>>>>>> The plural of anecdote is not data.
~ Roger Brinner The
>>>>>>>>>>>> combination of some data and an
aching desire for an
>>>>>>>>>>>>
>>>>>>>>>>> answer does
>>>>>>>>>>>
>>>>>>>>>>>> not ensure that a reasonable answer
can be extracted from a
>>>>>>>>>>>>
>>>>>>>>>>> given body
>>>>>>>>>>>
>>>>>>>>>>>> of data. ~ John Tukey
>>>>>>>>>>>>
>>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp
Singer
>>>>>>>>>>>>
>>>>>>>>>>> <killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>
>>>>>>>>>>>
>>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>>:
>>>>>>>>>>>>
>>>>>>>>>>>> Dear group - I am currently fitting
a Poisson glmer
>>>>>>>>>>>>
>>>>>>>>>>> where I have
>>>>>>>>>>>
>>>>>>>>>>>> an excess of outcomes that are zero
(>95%). I am now
>>>>>>>>>>>>
>>>>>>>>>>> debating on
>>>>>>>>>>>
>>>>>>>>>>>> how to proceed and came up with
three options:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.) Just fit a regular glmer to the
complete data. I am
>>>>>>>>>>>>
>>>>>>>>>>> not fully
>>>>>>>>>>>
>>>>>>>>>>>> sure how interpret the coefficients
then, are they more
>>>>>>>>>>>>
>>>>>>>>>>> optimizing
>>>>>>>>>>>
>>>>>>>>>>>> towards distinguishing zero and
non-zero, or also
>>>>>>>>>>>>
>>>>>>>>>>> capturing the
>>>>>>>>>>>
>>>>>>>>>>>> differences in those outcomes that
are non-zero?
>>>>>>>>>>>>
>>>>>>>>>>>> 2.) Leave all zeros out of the data
and fit a glmer to
>>>>>>>>>>>>
>>>>>>>>>>> only those
>>>>>>>>>>>
>>>>>>>>>>>> outcomes that are non-zero. Then, I
would only learn about
>>>>>>>>>>>> differences in the non-zero outcomes
though.
>>>>>>>>>>>>
>>>>>>>>>>>> 3.) Use a zero-inflated Poisson
model. My data is quite
>>>>>>>>>>>> large-scale, so I am currently
playing around with the EM
>>>>>>>>>>>> implementation of Bolker et al. that
alternates between
>>>>>>>>>>>>
>>>>>>>>>>> fitting a
>>>>>>>>>>>
>>>>>>>>>>>> glmer with data that are weighted
according to their zero
>>>>>>>>>>>> probability, and fitting a logistic
regression for the
>>>>>>>>>>>>
>>>>>>>>>>> probability
>>>>>>>>>>>
>>>>>>>>>>>> that a data point is zero. The
method is elaborated for
>>>>>>>>>>>>
>>>>>>>>>>> the OWL
>>>>>>>>>>>
>>>>>>>>>>>> data in:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>> I am not fully sure how to interpret
the results for the
>>>>>>>
>>>>>>>> zero-inflated version though. Would I
need to interpret the
>>>>>>>>>>>> coefficients for the result of the
glmer similar to as
>>>>>>>>>>>>
>>>>>>>>>>> I would do
>>>>>>>>>>>
>>>>>>>>>>>> for my idea of 2)? And then on top
of that interpret the
>>>>>>>>>>>> coefficients for the logistic
regression regarding whether
>>>>>>>>>>>> something is in the perfect or
imperfect state? I am
>>>>>>>>>>>>
>>>>>>>>>>> also not
>>>>>>>>>>>
>>>>>>>>>>>> quite sure what the common approach
for the zformula is
>>>>>>>>>>>>
>>>>>>>>>>> here. The
>>>>>>>>>>>
>>>>>>>>>>>> OWL elaborations only use
zformula=z~1, so no random
>>>>>>>>>>>>
>>>>>>>>>>> effect; I
>>>>>>>>>>>
>>>>>>>>>>>> would use the same formula as for
the glmer.
>>>>>>>>>>>>
>>>>>>>>>>>> I am appreciating some help and
pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks! Philipp
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>
>>>>>>>>>>>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>>
mailing list
>>>>>>>>>>>
>>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>
mailing list
>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>>
_______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>
>>>>>>>>
_______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>
_______________________________________________
>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing
list
>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
A gap between zero and the lowest non-zero count is normal for zero-inflated data when the Poisson part has a high mean. In that case very few (if any) zero's stem from the Poisson part. Another option is to try a hurdle model. Or approximate a hurdle model by fitting two separate models: a logistic regression (zero or not) and a Poisson regression (count > 0). In case Prob(Poisson(mu) == 0) is small then the approximation should be OK. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 22:03 GMT+02:00 Philipp Singer <killver at gmail.com>:
NBinom was not really successfull unitl now, but will try to tune. Thanks for your help! One point I forgot to mention was that apart from my excess of zeros, the lowest data outcome is 10, so there is a gap between zeri and 10. Could that be somehow a problem? On 27.06.2016 21:59, Thierry Onkelinx wrote: If there is overdispersion, then try a negative binomial model or a zero-inflated negative binomial model. If not try a zero-inflated Poisson. Adding relevant covariates can reduce overdispersion. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com>:
Well, as posted beforehand the std dev is 9.5 ... so does not seem too good then :/ Any other idea? On 27.06.2016 17:31, Thierry Onkelinx wrote: Dear Philipp, You've been bitten by observation level random effects. I've put together a document about it on http://rpubs.com/INBOstats/OLRE. Bottomline you're OKish when the standard devation of the OLRE smaller than 1. You're in trouble when it's above 3. In between you need to check the model carefully. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 16:17 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
Here is the fitted vs. residual plot for the observation-level poisson model where the observation level has been removed as taken from: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html So basically the prediction is always close to zero. Note that this is just on a very small sample (1000 data points). If I fit a nbinom2 to this smalle sample, I get predictions that are always around ~20 (but never zero). Both plots are attached. What I am wondering is whether I can do inference on a fixed parameter in my model which is my main task of this study. The effect is similar in the different models and in general I am only itnerested in whether it is positive/negative and "significant" which it is. However, as can be seen, the prediction looks not too good here. 2016-06-27 15:18 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
The variance is: Conditional model: Groups Name Variance Std.Dev. obs (Intercept) 8.991e+01 9.4823139 2016-06-27 15:06 GMT+02:00 Thierry Onkelinx < <thierry.onkelinx at inbo.be>thierry.onkelinx at inbo.be>:
Dear Philipp, How strong is the variance of the observation level random effect? I would trust a model with large OLRE variance. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-27 14:59 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
I have now played around more with the data an the models both using lme4 and glmmTMB. I can report the following: Modeling the data with a zero-inflated Poisson improves the model significantly. However, when calling predict and simulating rpoissons, I end up with nearly no values that are zero (in the original data there are 96% zero). When I model the data with overdisperion by including an observation-level random effect, I can also improve the model (not surprisingly due to the random effect). When I predict outcomes by ignoring the observation-level random effect (in lme4), I receive bad prediction if I compare it to the original data. While many zeros can be captured (of course), the positive outcomes can not be captured well. Combining zero inflation and overdispersion further improves the model, but I can only do that with glmmTMB and then have troubles doing predictions ignoring the observation-level random effect. Another side question: In lme4, when I do: m = glm(x~1,family="poisson") rpois(n=len(data),lambda=predict(m, type='response',re.form=NA) vs. simulate(1,m,re.form=NA) I receive different outcomes? Do I understand these function wrongly? Would appreciate some more help/pointers! Thanks, Philipp 2016-06-24 15:52 GMT+02:00 Philipp Singer < <killver at gmail.com> killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions. Regarding the installation: I was trying to somehow do it in
anaconda with
a specific R kernel and had some issues. I am trying to resort that
with
the anaconda guys though, if I have a tutorial on how to properly
setup
glmmTMB in anaconda, I will let you know. The install worked fine
in my
standard R environment. On 24.06.2016 15 <24.06.2016%2015>:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best. When you go there: - details on installation problems/hiccups would be useful - a reproducible example for the problem listed below would be
useful
- dispformula is for allowing dispersion/residual variance to
vary
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message. Warning message: In nlminb(start = par, objective = fn, gradient = gr): NA/NaN
function
evaluation Error in solve.default(hessian.fixed): Lapack routine dgesv:
system is
exactly singular: U[3,3] = 0 Traceback: 1. glmmTMB(y ~ 1 + x + (1 | b), . data = data, family = "poisson", dispformula = ~1 + x) 2. sdreport(obj) 3. solve(hessian.fixed) 4. solve(hessian.fixed) 5. solve.default(hessian.fixed) Any ideas on that? BTW: Is it fine to post glmmTMB questions here, or should I
rather use
the github issue page, or is there maybe a dedicated mailing list? Thanks, Philipp On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing,
but
works now on my 3.3 R setup. One question I have is regarding the specification of dispersion
as I
need to specify the dispformula. What is the difference here
between
just specifying fixed effects vs. also the random effects? On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in
vignette("glmmTMB")
and it was 2.3 times faster than glmer for one simple GLMM. On 23Jun 2016, at 10:44, Philipp Singer < <killver at gmail.com>
killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my
data.
Did not know about glmmTMB, will try it out. Does it work with crossed random effects and how does it scale with more data? I
will
check the docu and try it though. Thanks for the info. On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much
faster
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp, You could also try fitting the model with and without ZI
using
either glmmADMB or glmmTMB. Then compare the AICs. I believe model selection is useful for this, but I could be missing something since
the
simulation procedure that Thierry described seems to
recommended
more often. <https://github.com/glmmTMB/glmmTMB>
glmmTMB is still in the development phase, but we?ve done a
lot of
testing. cheers, Mollie ------------------------ Mollie Brooks, PhD Postdoctoral
Researcher,
Population Ecology Research Group Department of Evolutionary
Biology
& Environmental Studies, University of Z?rich <http://www.popecol.org/team/mollie-brooks/>
On 23Jun 2016, at 8:22, Philipp Singer < <killver at gmail.com>
killver at gmail.com> wrote:
Thanks, great information, that is really helpful. I agree that those are different things, however when using
a
random effect for overdispersion, I can simulate the same
number of
zero outcomes (~95%). On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model
zero-inflation.
Those are two different things. I've put some information together in <http://rpubs.com/INBOstats/zeroinflation>
ir. Thierry Onkelinx Instituut voor natuur- en
bosonderzoek /
Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be
no more than asking him to perform a post-mortem
examination: he
may be able to say what the experiment died of. ~ Sir
Ronald
Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire
for an
answer does not ensure that a reasonable answer can be
extracted
from a given body of data. ~ John Tukey 2016-06-23 12:42 GMT+02:00 Philipp Singer <
<killver at gmail.com>killver at gmail.com
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>:
Thanks! Actually, accounting for overdispersion is super important as it seems, then the zeros can be captured well. On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp, 1. Fit a Poisson model to the data. 2. Simulate a new
response
vector for the dataset according to the model. 3. Count
the
number of zero's in the simulated response vector. 4.
Repeat
step 2 and 3 a decent number of time and plot a histogram
of
the number of zero's in the simulation. If the number of
zero's
in the original dataset is larger than those in the simulations, then the model can't capture all zero's. In
such
case, first try to update the model and repeat the
procedure.
If that fails, look for zero-inflated models. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en
bosonderzoek /
Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may
be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment
died of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not
data.
~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable
answer
can be extracted from a given body of data. ~ John Tukey 2016-06-23 11:27 GMT+02:00 Philipp Singer <
<killver at gmail.com>killver at gmail.com
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>:
Thanks Thierry - That totally makes sense. Is there some
way of
formally checking that, except thinking about the setting
and
underlying processes? On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp, Do you have just lots of zero's, or more zero's than the
Poisson
distribution can explain? Those are two different things.
The example
below generates data from a Poisson distribution and has
99% zero's
but no zero-inflation. The second example has only 1%
zero's but is
clearly zero-inflated. set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01)
mean(sim
== 0) hist(sim) sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
lambda = 1000)
mean(sim.infl == 0) hist(sim.infl) So before looking for zero-inflated models, try to model
the zero's.
Best regards, ir. Thierry Onkelinx Instituut voor natuur- en
bosonderzoek /
Research Institute
for Nature
and Forest team Biometrie & Kwaliteitszorg / team
Biometrics
& Quality
Assurance
Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done
may be no
more than asking him to perform a post-mortem
examination:
he may be
able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an
answer does
not ensure that a reasonable answer can be extracted
from a
given body
of data. ~ John Tukey 2016-06-23 10:07 GMT+02:00 Philipp Singer
< <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>>:
Dear group - I am currently fitting a Poisson glmer
where I have
an excess of outcomes that are zero (>95%). I am now
debating on
how to proceed and came up with three options: 1.) Just fit a regular glmer to the complete data. I am
not fully
sure how interpret the coefficients then, are they more
optimizing
towards distinguishing zero and non-zero, or also
capturing the
differences in those outcomes that are non-zero? 2.) Leave all zeros out of the data and fit a glmer to
only those
outcomes that are non-zero. Then, I would only learn
about
differences in the non-zero outcomes though. 3.) Use a zero-inflated Poisson model. My data is quite large-scale, so I am currently playing around with the EM implementation of Bolker et al. that alternates between
fitting a
glmer with data that are weighted according to their zero probability, and fitting a logistic regression for the
probability
that a data point is zero. The method is elaborated for
the OWL
data in:
<https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
<
<https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf> https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
I would do
for my idea of 2)? And then on top of that interpret the coefficients for the logistic regression regarding
whether
something is in the perfect or imperfect state? I am
also not
quite sure what the common approach for the zformula is
here. The
OWL elaborations only use zformula=z~1, so no random
effect; I
would use the same formula as for the glmer. I am appreciating some help and pointers. Thanks! Philipp
_______________________________________________ <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org
<mailto: <R-sig-mixed-models at r-project.org>
R-sig-mixed-models at r-project.org>>> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models < <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org>> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models < <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models < <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org <mailto: <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org> mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org> R-sig-mixed-models at r-project.org mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models [[alternative HTML version deleted]] _______________________________________________ <R-sig-mixed-models at r-project.org>R-sig-mixed-models at r-project.org mailing list <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Unfortunately, if I model the data with a zero-inflated negative binomial model (which appears to be the most appropriate model to me), the fitted values are never zero, but hover around a mean of 20, even though, as said, my data contains around 95% zeros. I thought about hurdle models as well, but zero-inflated definitely fit the process better.
On 27.06.2016 21:59, Thierry Onkelinx wrote:
If there is overdispersion, then try a negative binomial model or a
zero-inflated negative binomial model. If not try a zero-inflated
Poisson. Adding relevant covariates can reduce overdispersion.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Well, as posted beforehand the std dev is 9.5 ... so does not seem
too good then :/
Any other idea?
On 27.06.2016 17:31, Thierry Onkelinx wrote:
Dear Philipp,
You've been bitten by observation level random effects. I've put
together a document about it on http://rpubs.com/INBOstats/OLRE.
Bottomline you're OKish when the standard devation of the OLRE
smaller than 1. You're in trouble when it's above 3. In between
you need to check the model carefully.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for
Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination: he
may be able to say what the experiment died of. ~ Sir Ronald
Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not ensure that a reasonable answer can be extracted from a
given body of data. ~ John Tukey
2016-06-27 16:17 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
Here is the fitted vs. residual plot for the
observation-level poisson model where the observation level
has been removed as taken from:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html
So basically the prediction is always close to zero.
Note that this is just on a very small sample (1000 data points).
If I fit a nbinom2 to this smalle sample, I get predictions
that are always around ~20 (but never zero). Both plots are
attached.
What I am wondering is whether I can do inference on a fixed
parameter in my model which is my main task of this study.
The effect is similar in the different models and in general
I am only itnerested in whether it is positive/negative and
"significant" which it is. However, as can be seen, the
prediction looks not too good here.
2016-06-27 15:18 GMT+02:00 Philipp Singer <killver at gmail.com
<mailto:killver at gmail.com>>:
The variance is:
Conditional model:
Groups Name Variance Std.Dev.
obs (Intercept) 8.991e+01 9.4823139
2016-06-27 15:06 GMT+02:00 Thierry Onkelinx
<thierry.onkelinx at inbo.be <mailto:thierry.onkelinx at inbo.be>>:
Dear Philipp,
How strong is the variance of the observation level
random effect? I would trust a model with large OLRE
variance.
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research
Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics &
Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is
done may be no more than asking him to perform a
post-mortem examination: he may be able to say what
the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer
can be extracted from a given body of data. ~ John Tukey
2016-06-27 14:59 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>>:
I have now played around more with the data an
the models both using lme4
and glmmTMB.
I can report the following:
Modeling the data with a zero-inflated Poisson
improves the model
significantly. However, when calling predict and
simulating rpoissons, I
end up with nearly no values that are zero (in
the original data there are
96% zero).
When I model the data with overdisperion by
including an observation-level
random effect, I can also improve the model (not
surprisingly due to the
random effect). When I predict outcomes by
ignoring the observation-level
random effect (in lme4), I receive bad prediction
if I compare it to the
original data. While many zeros can be captured
(of course), the positive
outcomes can not be captured well.
Combining zero inflation and overdispersion
further improves the model, but
I can only do that with glmmTMB and then have
troubles doing predictions
ignoring the observation-level random effect.
Another side question:
In lme4, when I do:
m = glm(x~1,family="poisson")
rpois(n=len(data),lambda=predict(m,
type='response',re.form=NA)
vs.
simulate(1,m,re.form=NA)
I receive different outcomes? Do I understand
these function wrongly?
Would appreciate some more help/pointers!
Thanks,
Philipp
2016-06-24 15:52 GMT+02:00 Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>>:
> Thanks - I started an issue there to answer
some of my questions.
>
> Regarding the installation: I was trying to
somehow do it in anaconda with
> a specific R kernel and had some issues. I am
trying to resort that with
> the anaconda guys though, if I have a tutorial
on how to properly setup
> glmmTMB in anaconda, I will let you know. The
install worked fine in my
> standard R environment.
>
>
> On 24.06.2016 15 <tel:24.06.2016%2015>:40, Ben
Bolker wrote:
>
>> Probably for now the glmmTMB issues page is best.
>>
>> When you go there:
>>
>> - details on installation problems/hiccups
would be useful
>> - a reproducible example for the problem
listed below would be useful
>> - dispformula is for allowing
dispersion/residual variance to vary
>> with covariates (i.e., modeling
heteroscedasticity)
>>
>> cheers
>> Ben Bolker
>>
>>
>> On 16-06-24 09:13 AM, Philipp Singer wrote:
>>
>>> Update, I tried it like that, but receive an
error message.
>>>
>>> Warning message:
>>> In nlminb(start = par, objective = fn,
gradient = gr): NA/NaN function
>>> evaluation
>>>
>>> Error in solve.default(hessian.fixed): Lapack
routine dgesv: system is
>>> exactly singular: U[3,3] = 0
>>> Traceback:
>>>
>>> 1. glmmTMB(y ~ 1 + x + (1 | b),
>>> . data = data, family = "poisson",
dispformula = ~1 + x)
>>> 2. sdreport(obj)
>>> 3. solve(hessian.fixed)
>>> 4. solve(hessian.fixed)
>>> 5. solve.default(hessian.fixed)
>>>
>>> Any ideas on that?
>>>
>>> BTW: Is it fine to post glmmTMB questions
here, or should I rather use
>>> the github issue page, or is there maybe a
dedicated mailing list?
>>>
>>> Thanks,
>>> Philipp
>>>
>>> On 24.06.2016 14:35, Philipp Singer wrote:
>>>
>>>> It indeed seems to run quite fast; had some
trouble installing, but
>>>> works now on my 3.3 R setup.
>>>>
>>>> One question I have is regarding the
specification of dispersion as I
>>>> need to specify the dispformula. What is the
difference here between
>>>> just specifying fixed effects vs. also the
random effects?
>>>>
>>>> On 23.06.2016 23:07, Mollie Brooks wrote:
>>>>
>>>>> glmmTMB does crossed RE. Ben did some
timings in vignette("glmmTMB")
>>>>> and it was 2.3 times faster than glmer for
one simple GLMM.
>>>>>
>>>>>
>>>>> On 23Jun 2016, at 10:44, Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>> wrote:
>>>>>>
>>>>>> Did try glmmADMB but unfortunately it is
way too slow for my data.
>>>>>>
>>>>>> Did not know about glmmTMB, will try it
out. Does it work with
>>>>>> crossed random effects and how does it
scale with more data? I will
>>>>>> check the docu and try it though. Thanks
for the info.
>>>>>>
>>>>>> On 23.06.2016 19:14, Ben Bolker wrote:
>>>>>>
>>>>>>> I would also comment that glmmTMB is
likely to be much faster
>>>>>>> than the
>>>>>>> lme4-based EM approach ...
>>>>>>>
>>>>>>> cheers
>>>>>>> Ben B.
>>>>>>>
>>>>>>> On 16-06-23 12:47 PM, Mollie Brooks wrote:
>>>>>>>
>>>>>>>> Hi Philipp,
>>>>>>>>
>>>>>>>> You could also try fitting the model
with and without ZI using
>>>>>>>> either
>>>>>>>> glmmADMB or glmmTMB. Then compare the
AICs. I believe model
>>>>>>>> selection
>>>>>>>> is useful for this, but I could be
missing something since the
>>>>>>>> simulation procedure that Thierry
described seems to recommended
>>>>>>>> more
>>>>>>>> often.
>>>>>>>>
>>>>>>>> https://github.com/glmmTMB/glmmTMB
>>>>>>>> http://glmmadmb.r-forge.r-project.org
>>>>>>>>
>>>>>>>> glmmTMB is still in the development
phase, but we?ve done a lot of
>>>>>>>> testing.
>>>>>>>>
>>>>>>>> cheers, Mollie
>>>>>>>>
>>>>>>>> ------------------------ Mollie Brooks,
PhD Postdoctoral Researcher,
>>>>>>>> Population Ecology Research Group
Department of Evolutionary Biology
>>>>>>>> & Environmental Studies, University of
Z?rich
>>>>>>>> http://www.popecol.org/team/mollie-brooks/ >>>>>>>> >>>>>>>> >>>>>>>> On 23Jun 2016, at 8:22, Philipp Singer
<killver at gmail.com <mailto:killver at gmail.com>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks, great information, that is
really helpful.
>>>>>>>>>
>>>>>>>>> I agree that those are different
things, however when using a
>>>>>>>>> random effect for overdispersion, I can
simulate the same number of
>>>>>>>>> zero outcomes (~95%).
>>>>>>>>>
>>>>>>>>> On 23.06.2016 15:50, Thierry Onkelinx
wrote:
>>>>>>>>>
>>>>>>>>>> Be careful when using overdispersion
to model zero-inflation.
>>>>>>>>>> Those are two different things.
>>>>>>>>>>
>>>>>>>>>> I've put some information together in
>>>>>>>>>> http://rpubs.com/INBOstats/zeroinflation
>>>>>>>>>>
>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>> Research Institute for Nature and
Forest team Biometrie &
>>>>>>>>>> Kwaliteitszorg / team Biometrics &
Quality Assurance
>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>
>>>>>>>>>> To call in the statistician after the
experiment is done may be
>>>>>>>>>> no more than asking him to perform a
post-mortem examination: he
>>>>>>>>>> may be able to say what the experiment
died of. ~ Sir Ronald
>>>>>>>>>> Aylmer Fisher The plural of anecdote
is not data. ~ Roger
>>>>>>>>>> Brinner The combination of some data
and an aching desire for an
>>>>>>>>>> answer does not ensure that a
reasonable answer can be extracted
>>>>>>>>>> from a given body of data. ~ John Tukey
>>>>>>>>>>
>>>>>>>>>> 2016-06-23 12:42 GMT+02:00 Philipp
Singer <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>:
>>>>>>>>>>
>>>>>>>>>> Thanks! Actually, accounting for
overdispersion is super
>>>>>>>>>> important as it seems, then the zeros
can be captured well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 23.06.2016 11:50, Thierry Onkelinx
wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>
>>>>>>>>>>> 1. Fit a Poisson model to the data.
2. Simulate a new response
>>>>>>>>>>> vector for the dataset according to
the model. 3. Count the
>>>>>>>>>>> number of zero's in the simulated
response vector. 4. Repeat
>>>>>>>>>>> step 2 and 3 a decent number of time
and plot a histogram of
>>>>>>>>>>> the number of zero's in the
simulation. If the number of zero's
>>>>>>>>>>> in the original dataset is larger
than those in the
>>>>>>>>>>> simulations, then the model can't
capture all zero's. In such
>>>>>>>>>>> case, first try to update the model
and repeat the procedure.
>>>>>>>>>>> If that fails, look for zero-inflated
models.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>>> Research Institute for Nature and
Forest team Biometrie &
>>>>>>>>>>> Kwaliteitszorg / team Biometrics &
Quality Assurance
>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>
>>>>>>>>>>> To call in the statistician after the
experiment is done may
>>>>>>>>>>> be no more than asking him to perform
a post-mortem
>>>>>>>>>>> examination: he may be able to say
what the experiment died of.
>>>>>>>>>>> ~ Sir Ronald Aylmer Fisher The plural
of anecdote is not data.
>>>>>>>>>>> ~ Roger Brinner The combination of
some data and an aching
>>>>>>>>>>> desire for an answer does not ensure
that a reasonable answer
>>>>>>>>>>> can be extracted from a given body of
data. ~ John Tukey
>>>>>>>>>>>
>>>>>>>>>>> 2016-06-23 11:27 GMT+02:00 Philipp
Singer <killver at gmail.com <mailto:killver at gmail.com>
>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>:
>>>>>>>>>>>
>>>>>>>>>>> Thanks Thierry - That totally makes
sense. Is there some way of
>>>>>>>>>>> formally checking that, except
thinking about the setting and
>>>>>>>>>>> underlying processes?
>>>>>>>>>>>
>>>>>>>>>>> On 23.06.2016 11:04, Thierry Onkelinx
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Philipp,
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have just lots of zero's, or
more zero's than the
>>>>>>>>>>>>
>>>>>>>>>>> Poisson
>>>>>>>>>>>
>>>>>>>>>>>> distribution can explain? Those are
two different things.
>>>>>>>>>>>>
>>>>>>>>>>> The example
>>>>>>>>>>>
>>>>>>>>>>>> below generates data from a Poisson
distribution and has
>>>>>>>>>>>>
>>>>>>>>>>> 99% zero's
>>>>>>>>>>>
>>>>>>>>>>>> but no zero-inflation. The second
example has only 1%
>>>>>>>>>>>>
>>>>>>>>>>> zero's but is
>>>>>>>>>>>
>>>>>>>>>>>> clearly zero-inflated.
>>>>>>>>>>>>
>>>>>>>>>>>> set.seed(1) n <- 1e8 sim <- rpois(n,
lambda = 0.01) mean(sim
>>>>>>>>>>>> == 0) hist(sim)
>>>>>>>>>>>>
>>>>>>>>>>>> sim.infl <- rbinom(n, size = 1, prob
= 0.99) * rpois(n,
>>>>>>>>>>>>
>>>>>>>>>>> lambda = 1000)
>>>>>>>>>>>
>>>>>>>>>>>> mean(sim.infl == 0) hist(sim.infl)
>>>>>>>>>>>>
>>>>>>>>>>>> So before looking for zero-inflated
models, try to model
>>>>>>>>>>>>
>>>>>>>>>>> the zero's.
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ir. Thierry Onkelinx Instituut voor
natuur- en bosonderzoek /
>>>>>>>>>>>> Research Institute
>>>>>>>>>>>>
>>>>>>>>>>> for Nature
>>>>>>>>>>>
>>>>>>>>>>>> and Forest team Biometrie &
Kwaliteitszorg / team Biometrics
>>>>>>>>>>>> & Quality
>>>>>>>>>>>>
>>>>>>>>>>> Assurance
>>>>>>>>>>>
>>>>>>>>>>>> Kliniekstraat 25 1070 Anderlecht Belgium
>>>>>>>>>>>>
>>>>>>>>>>>> To call in the statistician after
the experiment is done
>>>>>>>>>>>>
>>>>>>>>>>> may be no
>>>>>>>>>>>
>>>>>>>>>>>> more than asking him to perform a
post-mortem examination:
>>>>>>>>>>>>
>>>>>>>>>>> he may be
>>>>>>>>>>>
>>>>>>>>>>>> able to say what the experiment died
of. ~ Sir Ronald
>>>>>>>>>>>>
>>>>>>>>>>> Aylmer Fisher
>>>>>>>>>>>
>>>>>>>>>>>> The plural of anecdote is not data.
~ Roger Brinner The
>>>>>>>>>>>> combination of some data and an
aching desire for an
>>>>>>>>>>>>
>>>>>>>>>>> answer does
>>>>>>>>>>>
>>>>>>>>>>>> not ensure that a reasonable answer
can be extracted from a
>>>>>>>>>>>>
>>>>>>>>>>> given body
>>>>>>>>>>>
>>>>>>>>>>>> of data. ~ John Tukey
>>>>>>>>>>>>
>>>>>>>>>>>> 2016-06-23 10:07 GMT+02:00 Philipp
Singer
>>>>>>>>>>>>
>>>>>>>>>>> <killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>
>>>>>>>>>>>
>>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com <mailto:killver at gmail.com>>
>>>>>>>>>>>> <mailto:killver at gmail.com
<mailto:killver at gmail.com>
<mailto:killver at gmail.com
<mailto:killver at gmail.com>>>>>:
>>>>>>>>>>>>
>>>>>>>>>>>> Dear group - I am currently fitting
a Poisson glmer
>>>>>>>>>>>>
>>>>>>>>>>> where I have
>>>>>>>>>>>
>>>>>>>>>>>> an excess of outcomes that are zero
(>95%). I am now
>>>>>>>>>>>>
>>>>>>>>>>> debating on
>>>>>>>>>>>
>>>>>>>>>>>> how to proceed and came up with
three options:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.) Just fit a regular glmer to the
complete data. I am
>>>>>>>>>>>>
>>>>>>>>>>> not fully
>>>>>>>>>>>
>>>>>>>>>>>> sure how interpret the coefficients
then, are they more
>>>>>>>>>>>>
>>>>>>>>>>> optimizing
>>>>>>>>>>>
>>>>>>>>>>>> towards distinguishing zero and
non-zero, or also
>>>>>>>>>>>>
>>>>>>>>>>> capturing the
>>>>>>>>>>>
>>>>>>>>>>>> differences in those outcomes that
are non-zero?
>>>>>>>>>>>>
>>>>>>>>>>>> 2.) Leave all zeros out of the data
and fit a glmer to
>>>>>>>>>>>>
>>>>>>>>>>> only those
>>>>>>>>>>>
>>>>>>>>>>>> outcomes that are non-zero. Then, I
would only learn about
>>>>>>>>>>>> differences in the non-zero outcomes
though.
>>>>>>>>>>>>
>>>>>>>>>>>> 3.) Use a zero-inflated Poisson
model. My data is quite
>>>>>>>>>>>> large-scale, so I am currently
playing around with the EM
>>>>>>>>>>>> implementation of Bolker et al. that
alternates between
>>>>>>>>>>>>
>>>>>>>>>>> fitting a
>>>>>>>>>>>
>>>>>>>>>>>> glmer with data that are weighted
according to their zero
>>>>>>>>>>>> probability, and fitting a logistic
regression for the
>>>>>>>>>>>>
>>>>>>>>>>> probability
>>>>>>>>>>>
>>>>>>>>>>>> that a data point is zero. The
method is elaborated for
>>>>>>>>>>>>
>>>>>>>>>>> the OWL
>>>>>>>>>>>
>>>>>>>>>>>> data in:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>> I am not fully sure how to interpret
the results for the
>>>>>>>
>>>>>>>> zero-inflated version though. Would I
need to interpret the
>>>>>>>>>>>> coefficients for the result of the
glmer similar to as
>>>>>>>>>>>>
>>>>>>>>>>> I would do
>>>>>>>>>>>
>>>>>>>>>>>> for my idea of 2)? And then on top
of that interpret the
>>>>>>>>>>>> coefficients for the logistic
regression regarding whether
>>>>>>>>>>>> something is in the perfect or
imperfect state? I am
>>>>>>>>>>>>
>>>>>>>>>>> also not
>>>>>>>>>>>
>>>>>>>>>>>> quite sure what the common approach
for the zformula is
>>>>>>>>>>>>
>>>>>>>>>>> here. The
>>>>>>>>>>>
>>>>>>>>>>>> OWL elaborations only use
zformula=z~1, so no random
>>>>>>>>>>>>
>>>>>>>>>>> effect; I
>>>>>>>>>>>
>>>>>>>>>>>> would use the same formula as for
the glmer.
>>>>>>>>>>>>
>>>>>>>>>>>> I am appreciating some help and
pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks! Philipp
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>
>>>>>>>>>>>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>>
mailing list
>>>>>>>>>>>
>>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>>
mailing list
>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
>>>>>>>>>
<mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>
<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>>
_______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>
>>>>>>>>
_______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>
_______________________________________________
>>>>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>
>>>>>> <mailto:R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org>>
mailing list
>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing
list
>>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models