An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130917/e4834844/attachment.pl>
glmer optimization questions
6 messages · Tobias Heed, Ben Bolker
Tobias Heed <tobias.heed at ...> writes:
Hello, I am trying to understand the different options for fitting with glmer. I have been unable to find an overview over which options are appropriate in which cases. If there is a document out there that explains these things, I'd be thankful for a link.
No (want to write one?)
My specific questions are:
1, what is the difference in using maxIter in the function call vs. using maxfun in glmerControl()? Which one is better or more important to change when a model doesn't converge (i.e., what kind of iteration do they stand for)? Maxiter seems not to be documented in the help of lme4 1.1.0, does this mean it should not be used anymore?
maxIter is old/obsolete. maxfun controls the iteration counter in the BOBYQA/Nelder-Mead phase (i.e., optimization over the 'theta' (Cholesky factor of random-effects variance-covariance matrices) parameter vector)
2, I have a model that does not converge with Nelder-Mead, but does converge with bobyqa -- from googling around, it seems that some people like one or the other better, but are there specific things I should look out for when using the one or the other? Or, are there specific cases in which using one or the other would be more recommendable?
We don't know enough about this (yet) to make strong recommendations
3, what kind of result or warning message would indicate that I should use the restart_edge option?
If you get parameters on the boundary (i.e. 0 variances, +/-1 correlations) it may be worth trying. However, I'm not sure it's actually implemented for glmer!
4, I got this warning: 2: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended. par appears to be the vector with parameters passed to the optimizer. Is it necessary (or just "better", but not imperative) to set maxfun to the value indicated in this equation, or higher? Why is a higher value for maxfun not used automatically when appropriate - does it have any negative consequences? Can I read out par easily somewhere?
I believe this is coming from BOBYQA, but I'm not sure.
5, when a model converges only after tinkering with any of the options (e.g., optimizer, maxfun, restart_edge) or maxiter, does this say anything about the quality or reliability of the fit?
I would certainly be more careful to assess convergence in these cases. Do the answers look sensible? (We hope to add some more functionality for checking convergence ...)
6, when reporting a GLMM, should these kinds of options be reported? It doesn't seem that people do, but it would seem appropriate when they are necessary to achieve convergence etc., wouldn't it?
Absolutely. You should always report *everything* necessary
for someone to reproduce your results (in an appendix or online
supplement, if necessary).
cheers
Ben Bolker
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130917/1719546c/attachment.pl>
On 13-09-17 04:25 PM, Tobias Heed wrote:
Ben, thanks for the reply. So for now (till those tools are available), by 'assess convergence', do you mean just checking whether the results look meaningful and like what I expect from plots? For convergence, I have a strange result: With Nelder-Mead, my model converges for some factor orders (I mean, the order I put them in the function call), but not with others. This seems to be reproducible (with the given dataset). So, say, my model converges for response ~ A * B * C + random, but not for B * A * C + random. The model converges with all orders using bobyqa. I found another report like this (order effect) in a post somewhere, but it didn't seem to have been solved. Order really shouldn't matter, should it? Could this be due to starting values for optimization or something like that?
That is strange. Can you send data? A quick test of convergence should be *something* like library(lme4) fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) library(numDeriv) dd <- update(fm1,devFunOnly=TRUE) hh <- hessian(dd,getME(fm1,"theta")) evd <- eigen(H, symmetric=TRUE, only.values=TRUE)$values ## should be positive definite see https://github.com/lme4/lme4/issues/120 for more detailed code from Rune Christensen that implements a series of convergence checks
Tobias -- --------------------------------------------------------------------------------------------------------------
Tobias Heed, PhD
Biological Psychology and Neuropsychology | University of Hamburg Von-Melle-Park 11, Room 206 | D-20146 Hamburg, Germany Phone: (49) 40 - 42838 5831 | Fax: (49) 40 - 42838 6591 tobias.heed at uni-hamburg.de | Website | Google Scholar | ResearcherID -------------------------------------------------------------------------------------------------------------- On 17.09.2013, at 21:27, Ben Bolker <bbolker at gmail.com> wrote:
Tobias Heed <tobias.heed at ...> writes:
Hello, I am trying to understand the different options for fitting with glmer. I have been unable to find an overview over which options are appropriate in which cases. If there is a document out there that explains these things, I'd be thankful for a link.
No (want to write one?)
My specific questions are:
1, what is the difference in using maxIter in the function call vs. using maxfun in glmerControl()? Which one is better or more important to change when a model doesn't converge (i.e., what kind of iteration do they stand for)? Maxiter seems not to be documented in the help of lme4 1.1.0, does this mean it should not be used anymore?
maxIter is old/obsolete. maxfun controls the iteration counter in the BOBYQA/Nelder-Mead phase (i.e., optimization over the 'theta' (Cholesky factor of random-effects variance-covariance matrices) parameter vector)
2, I have a model that does not converge with Nelder-Mead, but does converge with bobyqa -- from googling around, it seems that some people like one or the other better, but are there specific things I should look out for when using the one or the other? Or, are there specific cases in which using one or the other would be more recommendable?
We don't know enough about this (yet) to make strong recommendations
3, what kind of result or warning message would indicate that I should use the restart_edge option?
If you get parameters on the boundary (i.e. 0 variances, +/-1 correlations) it may be worth trying. However, I'm not sure it's actually implemented for glmer!
4, I got this warning: 2: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended. par appears to be the vector with parameters passed to the optimizer. Is it necessary (or just "better", but not imperative) to set maxfun to the value indicated in this equation, or higher? Why is a higher value for maxfun not used automatically when appropriate - does it have any negative consequences? Can I read out par easily somewhere?
I believe this is coming from BOBYQA, but I'm not sure.
5, when a model converges only after tinkering with any of the options (e.g., optimizer, maxfun, restart_edge) or maxiter, does this say anything about the quality or reliability of the fit?
I would certainly be more careful to assess convergence in these cases. Do the answers look sensible? (We hope to add some more functionality for checking convergence ...)
6, when reporting a GLMM, should these kinds of options be reported? It doesn't seem that people do, but it would seem appropriate when they are necessary to achieve convergence etc., wouldn't it?
Absolutely. You should always report *everything* necessary for someone to reproduce your results (in an appendix or online supplement, if necessary). cheers Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130918/443cca37/attachment.pl>
On 13-09-18 05:44 AM, Tobias Heed wrote:
Ben, I was preparing the dataset to send to you, and re-ran those GLMMs. This time, I got no convergence on any of the different "permutations" of the formula. I then compared the estimates of the converged run (from yesterday) and the not-converged runs (from today), and they are very similar with only very small deviations (this is true for both fixed effect estimates and random effect correlations and variances). I then let the same model which did converge yesterday run 3 times today, and it never converged, but the estimation was always similar. I had it converge several times yesterday. The estimation is also similar to the bobyqa solution which (consistently) does converge? So it appears not to be a problem of permuting the factors in the formula, but rather a failure to replicate convergence (or non-convergence) in different runs of the same model with Nelder-Mead. This would seem something that could happen depending on the starting values for estimation -- are they chosen randomly each time, or are they fixed? Also, it seems like the problem stems from the end of optimization (given that parameters are so close to those of converged models). Let me know if you still want to look at the data (given that it seems harder to replicate than I thought yesterday, it looks like it might be cumbersome to find out what is going on). Best, Tobias
Please do send the data. There's not *supposed* to be any non-deterministic component to the lme4 fitting procedures. We have had problems in the past with internal components of the fitted object not getting re-set exactly to their starting values, and I think there may still be some small issues there, so any examples we can get are useful. Ben Bolker
On 17 Sep 2013, at 23:36, Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>> wrote:
On 13-09-17 04:25 PM, Tobias Heed wrote:
Ben, thanks for the reply. So for now (till those tools are available), by 'assess convergence', do you mean just checking whether the results look meaningful and like what I expect from plots? For convergence, I have a strange result: With Nelder-Mead, my model converges for some factor orders (I mean, the order I put them in the function call), but not with others. This seems to be reproducible (with the given dataset). So, say, my model converges for response ~ A * B * C + random, but not for B * A * C + random. The model converges with all orders using bobyqa. I found another report like this (order effect) in a post somewhere, but it didn't seem to have been solved. Order really shouldn't matter, should it? Could this be due to starting values for optimization or something like that?
That is strange. Can you send data? A quick test of convergence should be *something* like library(lme4) fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) library(numDeriv) dd <- update(fm1,devFunOnly=TRUE) hh <- hessian(dd,getME(fm1,"theta")) evd <- eigen(H, symmetric=TRUE, only.values=TRUE)$values ## should be positive definite see https://github.com/lme4/lme4/issues/120 for more detailed code from Rune Christensen that implements a series of convergence checks
Tobias -- --------------------------------------------------------------------------------------------------------------
Tobias Heed, PhD
Biological Psychology and Neuropsychology | University of Hamburg Von-Melle-Park 11, Room 206 | D-20146 Hamburg, Germany Phone: (49) 40 - 42838 5831 | Fax: (49) 40 - 42838 6591 tobias.heed at uni-hamburg.de <mailto:tobias.heed at uni-hamburg.de> | Website | Google Scholar | ResearcherID -------------------------------------------------------------------------------------------------------------- On 17.09.2013, at 21:27, Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>> wrote:
Tobias Heed <tobias.heed at ...> writes:
Hello, I am trying to understand the different options for fitting with glmer. I have been unable to find an overview over which options are appropriate in which cases. If there is a document out there that explains these things, I'd be thankful for a link.
No (want to write one?)
My specific questions are:
1, what is the difference in using maxIter in the function call vs. using maxfun in glmerControl()? Which one is better or more important to change when a model doesn't converge (i.e., what kind of iteration do they stand for)? Maxiter seems not to be documented in the help of lme4 1.1.0, does this mean it should not be used anymore?
maxIter is old/obsolete. maxfun controls the iteration counter in the BOBYQA/Nelder-Mead phase (i.e., optimization over the 'theta' (Cholesky factor of random-effects variance-covariance matrices) parameter vector)
2, I have a model that does not converge with Nelder-Mead, but does converge with bobyqa -- from googling around, it seems that some people like one or the other better, but are there specific things I should look out for when using the one or the other? Or, are there specific cases in which using one or the other would be more recommendable?
We don't know enough about this (yet) to make strong recommendations
3, what kind of result or warning message would indicate that I should use the restart_edge option?
If you get parameters on the boundary (i.e. 0 variances, +/-1 correlations) it may be worth trying. However, I'm not sure it's actually implemented for glmer!
4, I got this warning: 2: In commonArgs(par, fn, control, environment()) : maxfun < 10 * length(par)^2 is not recommended. par appears to be the vector with parameters passed to the optimizer. Is it necessary (or just "better", but not imperative) to set maxfun to the value indicated in this equation, or higher? Why is a higher value for maxfun not used automatically when appropriate - does it have any negative consequences? Can I read out par easily somewhere?
I believe this is coming from BOBYQA, but I'm not sure.
5, when a model converges only after tinkering with any of the options (e.g., optimizer, maxfun, restart_edge) or maxiter, does this say anything about the quality or reliability of the fit?
I would certainly be more careful to assess convergence in these cases. Do the answers look sensible? (We hope to add some more functionality for checking convergence ...)
6, when reporting a GLMM, should these kinds of options be reported? It doesn't seem that people do, but it would seem appropriate when they are necessary to achieve convergence etc., wouldn't it?
Absolutely. You should always report *everything* necessary for someone to reproduce your results (in an appendix or online supplement, if necessary). cheers Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models