glmer optimization questions

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130917/e4834844/attachment.pl>
Tobias Heed <tobias.heed at ...> writes:
Hello,

I am trying to understand the different options for fitting with glmer.
I have been unable to find an overview over which options are 
appropriate in which cases.
If there is a document out there that explains these things, 
I'd be thankful for a link.
No (want to write one?)
My specific questions are:
1, what is the difference in using maxIter in the function call
vs. using maxfun in glmerControl()? Which one is better or more
important to change when a model doesn't converge (i.e., what kind
of iteration do they stand for)? Maxiter seems not to be documented
in the help of lme4 1.1.0, does this mean it should not be used
anymore?
maxIter is old/obsolete.
  maxfun controls the iteration counter in the BOBYQA/Nelder-Mead
phase (i.e., optimization over the 'theta' (Cholesky factor of
random-effects variance-covariance matrices) parameter vector)
2, I have a model that does not converge with Nelder-Mead, but does
converge with bobyqa -- from googling around, it seems that some
people like one or the other better, but are there specific things I
should look out for when using the one or the other? Or, are there
specific cases in which using one or the other would be more
recommendable?
We don't know enough about this (yet) to make strong recommendations
3, what kind of result or warning message would indicate that 
I should use the restart_edge option?
If you get parameters on the boundary (i.e. 0 variances,
+/-1 correlations) it may be worth trying.  However, I'm not
sure it's actually implemented for glmer!
 4, I got this warning: 2: In commonArgs(par, fn, control,
environment()) : maxfun < 10 * length(par)^2 is not recommended.
par appears to be the vector with parameters passed to the
optimizer. Is it necessary (or just "better", but not imperative) to
set maxfun to the value indicated in this equation, or higher? Why
is a higher value for maxfun not used automatically when appropriate
- does it have any negative consequences? Can I read out par easily
somewhere?
I believe this is coming from BOBYQA, but I'm not sure.
5, when a model converges only after tinkering with any of the
options (e.g., optimizer, maxfun, restart_edge) or maxiter, does
this say anything about the quality or reliability of the fit?
I would certainly be more careful to assess convergence in
these cases.  Do the answers look sensible?  (We hope to add
some more functionality for checking convergence ...)
6, when reporting a GLMM, should these kinds of options be reported?
It doesn't seem that people do, but it would seem appropriate when
they are necessary to achieve convergence etc., wouldn't it?
Absolutely.  You should always report *everything* necessary
for someone to reproduce your results (in an appendix or online
supplement, if necessary).

  cheers
    Ben Bolker
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130917/1719546c/attachment.pl>
Ben,

thanks for the reply. So for now (till those tools are available), by
'assess convergence', do you mean just checking whether the results
look meaningful and like what I expect from plots?

For convergence, I have a strange result: With Nelder-Mead, my model
converges for some factor orders (I mean, the order I put them in the
function call), but not with others. This seems to be reproducible
(with the given dataset). So, say, my model converges for response ~
A * B * C + random, but not for B * A * C + random. The model
converges with all orders using bobyqa. I found another report like
this (order effect) in a post somewhere, but it didn't seem to have
been solved. Order really shouldn't matter, should it? Could this be
due to starting values for optimization or something like that?
That is strange.  Can you send data?

  A quick test of convergence should be *something* like

library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
library(numDeriv)
dd <- update(fm1,devFunOnly=TRUE)
hh <- hessian(dd,getME(fm1,"theta"))
evd <- eigen(H, symmetric=TRUE, only.values=TRUE)$values
## should be positive definite

see https://github.com/lme4/lme4/issues/120
for more detailed code from Rune Christensen that implements
a series of convergence checks
Tobias

-- 
--------------------------------------------------------------------------------------------------------------

Tobias Heed, PhD
Biological Psychology and Neuropsychology  |  University of Hamburg 
Von-Melle-Park 11, Room 206  |  D-20146 Hamburg, Germany Phone: (49)
40 - 42838 5831  |  Fax:   (49) 40 - 42838 6591 
tobias.heed at uni-hamburg.de  |  Website  |  Google Scholar  |
ResearcherID 
--------------------------------------------------------------------------------------------------------------

 On 17.09.2013, at 21:27, Ben Bolker <bbolker at gmail.com> wrote:

Tobias Heed <tobias.heed at ...> writes:

Hello,

I am trying to understand the different options for fitting with
glmer. I have been unable to find an overview over which options
are appropriate in which cases. If there is a document out there
that explains these things, I'd be thankful for a link.
No (want to write one?)

My specific questions are:

1, what is the difference in using maxIter in the function call 
vs. using maxfun in glmerControl()? Which one is better or more 
important to change when a model doesn't converge (i.e., what
kind of iteration do they stand for)? Maxiter seems not to be
documented in the help of lme4 1.1.0, does this mean it should
not be used anymore?
maxIter is old/obsolete. maxfun controls the iteration counter in
the BOBYQA/Nelder-Mead phase (i.e., optimization over the 'theta'
(Cholesky factor of random-effects variance-covariance matrices)
parameter vector)

2, I have a model that does not converge with Nelder-Mead, but
does converge with bobyqa -- from googling around, it seems that
some people like one or the other better, but are there specific
things I should look out for when using the one or the other? Or,
are there specific cases in which using one or the other would be
more recommendable?
We don't know enough about this (yet) to make strong
recommendations

3, what kind of result or warning message would indicate that I
should use the restart_edge option?
If you get parameters on the boundary (i.e. 0 variances, +/-1
correlations) it may be worth trying.  However, I'm not sure it's
actually implemented for glmer!

4, I got this warning: 2: In commonArgs(par, fn, control, 
environment()) : maxfun < 10 * length(par)^2 is not recommended. 
par appears to be the vector with parameters passed to the 
optimizer. Is it necessary (or just "better", but not imperative)
to set maxfun to the value indicated in this equation, or higher?
Why is a higher value for maxfun not used automatically when
appropriate - does it have any negative consequences? Can I read
out par easily somewhere?
I believe this is coming from BOBYQA, but I'm not sure.

5, when a model converges only after tinkering with any of the 
options (e.g., optimizer, maxfun, restart_edge) or maxiter, does 
this say anything about the quality or reliability of the fit?
I would certainly be more careful to assess convergence in these
cases.  Do the answers look sensible?  (We hope to add some more
functionality for checking convergence ...)

6, when reporting a GLMM, should these kinds of options be
reported? It doesn't seem that people do, but it would seem
appropriate when they are necessary to achieve convergence etc.,
wouldn't it?
Absolutely.  You should always report *everything* necessary for
someone to reproduce your results (in an appendix or online 
supplement, if necessary).

cheers Ben Bolker

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130918/443cca37/attachment.pl>
Ben,

I was preparing the dataset to send to you, and re-ran those GLMMs.
This time, I got no convergence on any of the different "permutations"
of the formula.
I then compared the estimates of the converged run (from yesterday) and
the not-converged runs (from today), and they are very similar with only
very small deviations (this is true for both fixed effect estimates and
random effect correlations and variances).
I then let the same model which did converge yesterday run 3 times
today, and it never converged, but the estimation was always similar. I
had it converge several times yesterday.
The estimation is also similar to the bobyqa solution which
(consistently) does converge?

So it appears not to be a problem of permuting the factors in the
formula, but rather a failure to replicate convergence (or
non-convergence) in different runs of the same model with Nelder-Mead.
This would seem something that could happen depending on the starting
values for estimation -- are they chosen randomly each time, or are they
fixed?
Also, it seems like the problem stems from the end of optimization
(given that parameters are so close to those of converged models).

Let me know if you still want to look at the data (given that it seems
harder to replicate than I thought yesterday, it looks like it might be
cumbersome to find out what is going on). 

Best,
Tobias
Please do send the data.  There's not *supposed* to be any
non-deterministic component to the lme4 fitting procedures. We have had
problems in the past with internal components of the fitted object not
getting re-set exactly to their starting values, and I think there may
still be some small issues there, so any examples we can get are useful.

  Ben Bolker
On 17 Sep 2013, at 23:36, Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:

On 13-09-17 04:25 PM, Tobias Heed wrote:
Ben,

thanks for the reply. So for now (till those tools are available), by
'assess convergence', do you mean just checking whether the results
look meaningful and like what I expect from plots?

For convergence, I have a strange result: With Nelder-Mead, my model
converges for some factor orders (I mean, the order I put them in the
function call), but not with others. This seems to be reproducible
(with the given dataset). So, say, my model converges for response ~
A * B * C + random, but not for B * A * C + random. The model
converges with all orders using bobyqa. I found another report like
this (order effect) in a post somewhere, but it didn't seem to have
been solved. Order really shouldn't matter, should it? Could this be
due to starting values for optimization or something like that?
 That is strange.  Can you send data?

 A quick test of convergence should be *something* like

library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
library(numDeriv)
dd <- update(fm1,devFunOnly=TRUE)
hh <- hessian(dd,getME(fm1,"theta"))
evd <- eigen(H, symmetric=TRUE, only.values=TRUE)$values
## should be positive definite

see https://github.com/lme4/lme4/issues/120
for more detailed code from Rune Christensen that implements
a series of convergence checks

Tobias

-- 
--------------------------------------------------------------------------------------------------------------

Tobias Heed, PhD
Biological Psychology and Neuropsychology  |  University of Hamburg
Von-Melle-Park 11, Room 206  |  D-20146 Hamburg, Germany Phone: (49)
40 - 42838 5831  |  Fax:   (49) 40 - 42838 6591
tobias.heed at uni-hamburg.de <mailto:tobias.heed at uni-hamburg.de>  |
 Website  |  Google Scholar  |
ResearcherID
--------------------------------------------------------------------------------------------------------------

On 17.09.2013, at 21:27, Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:

Tobias Heed <tobias.heed at ...> writes:

Hello,

I am trying to understand the different options for fitting with
glmer. I have been unable to find an overview over which options
are appropriate in which cases. If there is a document out there
that explains these things, I'd be thankful for a link.
No (want to write one?)

My specific questions are:

1, what is the difference in using maxIter in the function call
vs. using maxfun in glmerControl()? Which one is better or more
important to change when a model doesn't converge (i.e., what
kind of iteration do they stand for)? Maxiter seems not to be
documented in the help of lme4 1.1.0, does this mean it should
not be used anymore?
maxIter is old/obsolete. maxfun controls the iteration counter in
the BOBYQA/Nelder-Mead phase (i.e., optimization over the 'theta'
(Cholesky factor of random-effects variance-covariance matrices)
parameter vector)

2, I have a model that does not converge with Nelder-Mead, but
does converge with bobyqa -- from googling around, it seems that
some people like one or the other better, but are there specific
things I should look out for when using the one or the other? Or,
are there specific cases in which using one or the other would be
more recommendable?
We don't know enough about this (yet) to make strong
recommendations

3, what kind of result or warning message would indicate that I
should use the restart_edge option?
If you get parameters on the boundary (i.e. 0 variances, +/-1
correlations) it may be worth trying.  However, I'm not sure it's
actually implemented for glmer!

4, I got this warning: 2: In commonArgs(par, fn, control,
environment()) : maxfun < 10 * length(par)^2 is not recommended.
par appears to be the vector with parameters passed to the
optimizer. Is it necessary (or just "better", but not imperative)
to set maxfun to the value indicated in this equation, or higher?
Why is a higher value for maxfun not used automatically when
appropriate - does it have any negative consequences? Can I read
out par easily somewhere?
I believe this is coming from BOBYQA, but I'm not sure.

5, when a model converges only after tinkering with any of the
options (e.g., optimizer, maxfun, restart_edge) or maxiter, does
this say anything about the quality or reliability of the fit?
I would certainly be more careful to assess convergence in these
cases.  Do the answers look sensible?  (We hope to add some more
functionality for checking convergence ...)

6, when reporting a GLMM, should these kinds of options be
reported? It doesn't seem that people do, but it would seem
appropriate when they are necessary to achieve convergence etc.,
wouldn't it?
Absolutely.  You should always report *everything* necessary for
someone to reproduce your results (in an appendix or online
supplement, if necessary).

cheers Ben Bolker

_______________________________________________
R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models