NBinom was not really successfull unitl now, but will try to tune. Thanks
for your help!
One point I forgot to mention was that apart from my excess of zeros, the
lowest data outcome is 10, so there is a gap between zeri and 10. Could
that be somehow a problem?
On 27.06.2016 21:59, Thierry Onkelinx wrote:
If there is overdispersion, then try a negative binomial model or a
zero-inflated negative binomial model. If not try a zero-inflated Poisson.
Adding relevant covariates can reduce overdispersion.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2016-06-27 17:46 GMT+02:00 Philipp Singer <killver at gmail.com>:
Well, as posted beforehand the std dev is 9.5 ... so does not seem too
good then :/
Any other idea?
On 27.06.2016 17:31, Thierry Onkelinx wrote:
Dear Philipp,
You've been bitten by observation level random effects. I've put together
a document about it on http://rpubs.com/INBOstats/OLRE. Bottomline
you're OKish when the standard devation of the OLRE smaller than 1. You're
in trouble when it's above 3. In between you need to check the model
carefully.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2016-06-27 16:17 GMT+02:00 Philipp Singer < <killver at gmail.com>
killver at gmail.com>:
Here is the fitted vs. residual plot for the observation-level poisson
model where the observation level has been removed as taken from:
<https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html>
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020817.html
So basically the prediction is always close to zero.
Note that this is just on a very small sample (1000 data points).
If I fit a nbinom2 to this smalle sample, I get predictions that are
always around ~20 (but never zero). Both plots are attached.
What I am wondering is whether I can do inference on a fixed parameter
in my model which is my main task of this study. The effect is similar in
the different models and in general I am only itnerested in whether it is
positive/negative and "significant" which it is. However, as can be seen,
the prediction looks not too good here.
2016-06-27 15:18 GMT+02:00 Philipp Singer < <killver at gmail.com>
killver at gmail.com>:
The variance is:
Conditional model:
Groups Name Variance Std.Dev.
obs (Intercept) 8.991e+01 9.4823139
2016-06-27 15:06 GMT+02:00 Thierry Onkelinx <
<thierry.onkelinx at inbo.be>thierry.onkelinx at inbo.be>:
Dear Philipp,
How strong is the variance of the observation level random effect? I
would trust a model with large OLRE variance.
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be able
to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body of
data. ~ John Tukey
2016-06-27 14:59 GMT+02:00 Philipp Singer < <killver at gmail.com>
killver at gmail.com>:
I have now played around more with the data an the models both using
lme4
and glmmTMB.
I can report the following:
Modeling the data with a zero-inflated Poisson improves the model
significantly. However, when calling predict and simulating
rpoissons, I
end up with nearly no values that are zero (in the original data
there are
96% zero).
When I model the data with overdisperion by including an
observation-level
random effect, I can also improve the model (not surprisingly due to
the
random effect). When I predict outcomes by ignoring the
observation-level
random effect (in lme4), I receive bad prediction if I compare it to
the
original data. While many zeros can be captured (of course), the
positive
outcomes can not be captured well.
Combining zero inflation and overdispersion further improves the
model, but
I can only do that with glmmTMB and then have troubles doing
predictions
ignoring the observation-level random effect.
Another side question:
In lme4, when I do:
m = glm(x~1,family="poisson")
rpois(n=len(data),lambda=predict(m, type='response',re.form=NA)
vs.
simulate(1,m,re.form=NA)
I receive different outcomes? Do I understand these function wrongly?
Would appreciate some more help/pointers!
Thanks,
Philipp
2016-06-24 15:52 GMT+02:00 Philipp Singer < <killver at gmail.com>
killver at gmail.com>:
Thanks - I started an issue there to answer some of my questions.
Regarding the installation: I was trying to somehow do it in
a specific R kernel and had some issues. I am trying to resort that
the anaconda guys though, if I have a tutorial on how to properly
glmmTMB in anaconda, I will let you know. The install worked fine
standard R environment.
On 24.06.2016 15 <24.06.2016%2015>:40, Ben Bolker wrote:
Probably for now the glmmTMB issues page is best.
When you go there:
- details on installation problems/hiccups would be useful
- a reproducible example for the problem listed below would be
- dispformula is for allowing dispersion/residual variance to
with covariates (i.e., modeling heteroscedasticity)
cheers
Ben Bolker
On 16-06-24 09:13 AM, Philipp Singer wrote:
Update, I tried it like that, but receive an error message.
Warning message:
In nlminb(start = par, objective = fn, gradient = gr): NA/NaN
evaluation
Error in solve.default(hessian.fixed): Lapack routine dgesv:
exactly singular: U[3,3] = 0
Traceback:
1. glmmTMB(y ~ 1 + x + (1 | b),
. data = data, family = "poisson", dispformula = ~1 + x)
2. sdreport(obj)
3. solve(hessian.fixed)
4. solve(hessian.fixed)
5. solve.default(hessian.fixed)
Any ideas on that?
BTW: Is it fine to post glmmTMB questions here, or should I
the github issue page, or is there maybe a dedicated mailing list?
Thanks,
Philipp
On 24.06.2016 14:35, Philipp Singer wrote:
It indeed seems to run quite fast; had some trouble installing,
works now on my 3.3 R setup.
One question I have is regarding the specification of dispersion
need to specify the dispformula. What is the difference here
just specifying fixed effects vs. also the random effects?
On 23.06.2016 23:07, Mollie Brooks wrote:
glmmTMB does crossed RE. Ben did some timings in
and it was 2.3 times faster than glmer for one simple GLMM.
On 23Jun 2016, at 10:44, Philipp Singer < <killver at gmail.com>
killver at gmail.com> wrote:
Did try glmmADMB but unfortunately it is way too slow for my
Did not know about glmmTMB, will try it out. Does it work with
crossed random effects and how does it scale with more data? I
check the docu and try it though. Thanks for the info.
On 23.06.2016 19:14, Ben Bolker wrote:
I would also comment that glmmTMB is likely to be much
than the
lme4-based EM approach ...
cheers
Ben B.
On 16-06-23 12:47 PM, Mollie Brooks wrote:
Hi Philipp,
You could also try fitting the model with and without ZI
either
glmmADMB or glmmTMB. Then compare the AICs. I believe model
selection
is useful for this, but I could be missing something since
simulation procedure that Thierry described seems to
glmmTMB is still in the development phase, but we?ve done a
testing.
cheers, Mollie
------------------------ Mollie Brooks, PhD Postdoctoral
Population Ecology Research Group Department of Evolutionary
On 23Jun 2016, at 8:22, Philipp Singer < <killver at gmail.com>
killver at gmail.com> wrote:
Thanks, great information, that is really helpful.
I agree that those are different things, however when using
random effect for overdispersion, I can simulate the same
zero outcomes (~95%).
On 23.06.2016 15:50, Thierry Onkelinx wrote:
Be careful when using overdispersion to model
ir. Thierry Onkelinx Instituut voor natuur- en
Research Institute for Nature and Forest team Biometrie &
Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done
no more than asking him to perform a post-mortem
may be able to say what the experiment died of. ~ Sir
Aylmer Fisher The plural of anecdote is not data. ~ Roger
Brinner The combination of some data and an aching desire
answer does not ensure that a reasonable answer can be
from a given body of data. ~ John Tukey
2016-06-23 12:42 GMT+02:00 Philipp Singer <
<killver at gmail.com>killver at gmail.com
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>:
Thanks! Actually, accounting for overdispersion is super
important as it seems, then the zeros can be captured well.
On 23.06.2016 11:50, Thierry Onkelinx wrote:
Dear Philipp,
1. Fit a Poisson model to the data. 2. Simulate a new
vector for the dataset according to the model. 3. Count
number of zero's in the simulated response vector. 4.
step 2 and 3 a decent number of time and plot a histogram
the number of zero's in the simulation. If the number of
in the original dataset is larger than those in the
simulations, then the model can't capture all zero's. In
case, first try to update the model and repeat the
If that fails, look for zero-inflated models.
Best regards,
ir. Thierry Onkelinx Instituut voor natuur- en
Research Institute for Nature and Forest team Biometrie &
Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done
be no more than asking him to perform a post-mortem
examination: he may be able to say what the experiment
~ Sir Ronald Aylmer Fisher The plural of anecdote is not
~ Roger Brinner The combination of some data and an aching
desire for an answer does not ensure that a reasonable
can be extracted from a given body of data. ~ John Tukey
2016-06-23 11:27 GMT+02:00 Philipp Singer <
<killver at gmail.com>killver at gmail.com
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>:
Thanks Thierry - That totally makes sense. Is there some
formally checking that, except thinking about the setting
underlying processes?
On 23.06.2016 11:04, Thierry Onkelinx wrote:
Dear Philipp,
Do you have just lots of zero's, or more zero's than the
distribution can explain? Those are two different things.
below generates data from a Poisson distribution and has
but no zero-inflation. The second example has only 1%
clearly zero-inflated.
set.seed(1) n <- 1e8 sim <- rpois(n, lambda = 0.01)
== 0) hist(sim)
sim.infl <- rbinom(n, size = 1, prob = 0.99) * rpois(n,
mean(sim.infl == 0) hist(sim.infl)
So before looking for zero-inflated models, try to model
Best regards,
ir. Thierry Onkelinx Instituut voor natuur- en
and Forest team Biometrie & Kwaliteitszorg / team
Kliniekstraat 25 1070 Anderlecht Belgium
To call in the statistician after the experiment is done
more than asking him to perform a post-mortem
able to say what the experiment died of. ~ Sir Ronald
The plural of anecdote is not data. ~ Roger Brinner The
combination of some data and an aching desire for an
not ensure that a reasonable answer can be extracted
of data. ~ John Tukey
2016-06-23 10:07 GMT+02:00 Philipp Singer
< <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>
<mailto: <killver at gmail.com>killver at gmail.com <mailto:
<killver at gmail.com>killver at gmail.com>>>>:
Dear group - I am currently fitting a Poisson glmer
an excess of outcomes that are zero (>95%). I am now
how to proceed and came up with three options:
1.) Just fit a regular glmer to the complete data. I am
sure how interpret the coefficients then, are they more
towards distinguishing zero and non-zero, or also
differences in those outcomes that are non-zero?
2.) Leave all zeros out of the data and fit a glmer to
outcomes that are non-zero. Then, I would only learn
differences in the non-zero outcomes though.
3.) Use a zero-inflated Poisson model. My data is quite
large-scale, so I am currently playing around with the EM
implementation of Bolker et al. that alternates between
glmer with data that are weighted according to their zero
probability, and fitting a logistic regression for the
that a data point is zero. The method is elaborated for
I am not fully sure how to interpret the results for the
zero-inflated version though. Would I need to interpret the
coefficients for the result of the glmer similar to as
for my idea of 2)? And then on top of that interpret the
coefficients for the logistic regression regarding
something is in the perfect or imperfect state? I am
quite sure what the common approach for the zformula is
OWL elaborations only use zformula=z~1, so no random
would use the same formula as for the glmer.
I am appreciating some help and pointers.
Thanks! Philipp