Skip to content

Convergence problem example

1 message · Douglas Bates

#
Let me expand a bit on what Ben says here.
On Wed, Apr 25, 2018 at 1:47 PM Ben Bolker <bbolker at gmail.com> wrote:

            
The "toast-scaping" is a reference Ed Deming's comments about trying to do
quality control by inspection after the fact.  He said this was a case of
"burning the toast and then scraping it" when the better course was not to
burn the toast in the first place.

Determining  ML or REML estimates for linear mixed-effects models can be a
difficult optimization problem.  The way we formulate it in lme4 is as a
constrained optimization problem but with rather simple constraints.  Some
of the components of the parameter vector must be non-negative. Sometimes
the problem can be very simple - optimize with respect to a single
parameter - and sometimes it can be relatively difficult.  Some algorithms
may work well on one type of problem and others work well on a different
problem.  My recent experience is that the NLopt (
https://nlopt.readthedocs.io/) implementation of the BOBYQA (Bounded
Optimization BY Quadratic Approximation) algorithm is both reliable and
fast.  It is available through the nloptr package for R.  In the Julia code
I use the NLopt.jl package (https://github.com/JuliaOpt/NLopt.jl).  I
haven't run into a case where another algorithm or implementation has
converged to a (substantially) better optimum than does this one.  (That
is, there may be a slightly better optimum from another algorithm but the
differences are negligible.)

My recommendation is to change the default algorithm to the nloptr
implementation of BOBYQA and drop the attempts to check convergence.  Some
of the problems with the convergence checks are:

- The conditions like the gradient being zero don't apply when convergence
is on the boundary

- The calculations are based on finite-difference derivatives and (I think)
finite differences of finite differences.  These are notoriously inaccurate.

- In any floating point calculation you can't expect an exact answer so
then you need to decide when you are close enough to, say, a gradient of
zero. This can be very difficult to determine.

The optimization for GLMMs can be even more difficult because, in the
second stage at least, the fixed-effects parameters are part of the general
optimization and the objective (deviance) function doesn't have a closed
form expression.  For some cases we can use adaptive Gauss-Hermite
quadrature, in other cases we use the Laplace approximation.   I was
responsible for the two-stage approach of doing an initial, fast,
optimization where the fixed-effects parameters were optimized in the PIRLS
(penalized iteratively reweighted least squares) evaluation followed by the
second, slower optimization with the fixed-effects parameters in the
general optimization.  The first stage may not even be necessary.  However,
I think that this again is a case where the optimization code - for both
stages - should default to the NLopt implementation of BOBYQA.

I would be quite happy to participate in studying different approaches to
the optimization using data sets from the literature and contributed data
and models.  I have started something like this in the benchmark section of
the MixedModels package for Julia.  At present it is just checking the
execution time for about 50 different linear mixed model fits but the code
could reasonably be modified to try different optimization methods and to
test GLMMs as well.  I realize that most readers of this list would want to
work in R but there are definite advantages in a Julia implementation.  For
one thing MixedModels is written completely in Julia whereas lme4 is an
exotic mixture of languages with all the delightful interfaces between
languages and packages.


  Besides being hard and tedious, this is time I don't really have.