On 11/06/2016, at 14:07, Ben Bolker <bbolker at gmail.com> wrote:
I think that's very sensible, but note that if the OP thinks they need
zero-inflation (they said that's why they came to glmmADMB), then they
will probably need to go slightly beyond standard linear models ...
perhaps a two-stage (zero vs non-zero logistic model + linear model on
log(x) of positive cases)
cheers
Ben Bolker
On 16-06-10 09:56 PM, John Maindonald wrote:
With count data where the mean is large, there?s a strong case to
be made for finding a suitable power transform of (count+1) and
using a normal theory model. If differences in residual variance
are evident, these can be accommodated by assigning weights.
Functions for getting diagnostics are better developed for the
normal theory models, and easier to interpret.
For the analysis of RNA-Seq data, there is a paper that compares
the two styles of model:
Law, CW, Chen, Y, Shi, W, Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.http://genomebiology.com/2014/15/2/R29
NB also the recent paper:
Schurch NJ, Schofield P, Gierli?ski M, Cole C, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson GG, Owen-Hughes T, Blaxter M, Barton GJ (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA http://www.rnajournal.org/cgi/doi/10.1261/rna.053959.115
The log(count+1) approach is the best, or close to the best, in the field.
John Maindonald email: john.maindonald at anu.edu.au
On 11/06/2016, at 12:29, Ben Bolker <bbolker at gmail.com> wrote:
* I can appreciate your frustration that glmmADMB doesn't work
out of the box on what seems initially to be a simple problem,
and I appreciate that you are being reasonably diplomatic about
it (not everyone is), but ... it is surprisingly hard to make
a general-purpose solve that works for every data set. glm()
works so well because it uses special-purpose algorithms for
a particular class of models that's narrower than what glmmADMB
can do.
* The particular problem is that for Poisson responses with
very large means, the coefficient of variation of the data
also becomes very small. That means that values that are a little
bit off the mean become very unlikely, so unlikely that they
underflow (the probabilities become numerically zero). I can
get your example to work if I set the starting value close enough:
set.seed(101)
d <- data.frame(x=rpois(100, lambda=8000))
library(glmmADMB)
glmmadmb(x~1, family="poisson", data=d)
glmmadmb(x~1, family="poisson", data=d,
start=list(fixed=log(7500))) ## works
... it does, however, fail with fixed=log(7000).
* the "mysterious" poiss_prob_bound (which *is* documented in
?admbControl, so it may be obscure but it's not completely mysterious ...)
was added to fix a problem that another user was having with
another data set ... if you
file.show(system.file("tpl","glmmadmb.tpl",package="glmmADMB"))
and dig around, you'll find that glmmADMB bumps up the probability of
a Poisson by 1e-6 or 1e-10 (in different optimization phases) to prevent
a value of exactly zero. There *is* probably a better way to do this -
there are approximations of the Poisson that work very well in such
small ranges, and they could be used instead - but there is also
a reasonably unending number of such edge cases, and not that many
person-hours to deal with them.
* You could try the glmmTMB package, which is a more modern
eventually-replacement for glmmADMB, and which seems
to work on this particular problem -- but be warned that it's
in development, so you may find holes!
devtools::install_github("glmmTMB/glmmTMB",sub="glmmTMB")
library(glmmTMB)
glmmTMB(x~1, family="poisson", data=d)
* if you have a reproducible example of 'dramatic stop that
does not reach the top level and so cannot be caught with
tryCatch' that would be useful. Maybe we can help resolve
that (more general) problem.
* Andrew Gelman has stated a principle called the "folk theorem of
statistical computing": When you have computational problems, often
there?s a problem with your model.
http://andrewgelman.com/2008/05/13/the_folk_theore/
In this case, I think the issue is that in real life (unless you're
in a field like particle physics), Poisson models are generally
not great models for processes with large means -- the coefficient
of variation is way too small to be realistic. This works with
a negative binomial ...
set.seed(101)
d2 <- data.frame(x=rnbinom(100, mu=8000, size=2))
glmmadmb(x~1,data=d2,family="nbinom")
(I can appreciate that you may have started with a neg binom
and come to the Poisson in part as a way of trying to simplify
your problem so you could diagnose it ...)
cheers
Ben Bolker
While this is a ?solution?, I would expect this kind of fit to work
?off-the-box? (it certainly does with the standard glm function). Can
someone comment on this. I am not an expert on GLMs and needed
something that would fit ZINB + random effects (admittedly the deep
end of GLMs) and was excited to find glmmADMB, but using it has proven
rather frustrating because it often fails (with a dramatic stop that
does not reach the top level and so cannot be caught with tryCatch,
nonetheless) and after I found the above example, I am not so sure it
is always for a good reason.
Thanks,
Vladimir
Session:
On 16-06-09 07:48 PM, Vladimir Trifonov wrote:
Hi,
Why is the following ?trivial? model fit failing? After all, we are
fitting a Poisson GLM on a Poisson data.
--------
glmmadmb(rpois(100, lambda=8000)~1, family="poisson")
Hessian is 0 in row 1
This means that the derivative if probably identically 0 for this
Error in matrix inverse -- matrix singular in inv(dmatrix)
Estimated covariance matrix may not be positive definite
1e+20
Estimated covariance matrix may not be positive definite
1e+20
GLMM's in R powered by AD Model Builder:
Family: poisson
link = log
I tried varying the number of samples (100 above), the mean (8000
above) and this fails. Here is another attempt (with a different
kind of fail)
glmmadmb(rpois(100, lambda=50)~1, family="poisson")
Parameters were estimated, but standard errors were not: the most
likely problem is that the curvature at MLE was zero or negative
Error in glmmadmb(rpois(100, lambda = 50) ~ 1, family = "poisson") :
The function maximizer failed (couldn't find parameter file)
Troubleshooting steps include (1) run with 'save.dir' set and inspect
output files; (2) change run parameters: see '?admbControl';(3) re-run
with debug=TRUE for more information on failure mode
In addition: Warning message:
running command './glmmadmb -maxfn 500 -maxph 5 -noinit -shess' had
After some trial and error, the only thing I found that ?works? (with
a warning about the covariance matrix) consistently is to set a
?mysterious? poiss_prob_bound = FALSE
glmmadmb(rpois(100, lambda=50)~1, family="poisson",
admb.opts=admbControl(poiss_prob_bound=FALSE))
Estimated covariance matrix may not be positive definite
0.020141
Estimated covariance matrix may not be positive definite
0.020141
GLMM's in R powered by AD Model Builder:
Family: poisson
link = log
Fixed effects:
Log-likelihood: -327.749
AIC: 657.498
Formula: rpois(100, lambda = 50) ~ 1
(Intercept)
3.904998
Number of observations: total=100
--------