Skip to content
Prev 9745 / 20628 Next

False convergence

Nick Isaac <njbisaac at ...> writes:
In general I would believe the verbose trace ...

  The stable version of lme4 is using the nlminb() optimizer internally,
which in turn is based on the PORT libraries

The docs linked from ?nlminb:

http://netlib.bell-labs.com/cm/cs/cstr/153.pdf

The only useful material I could find in these docs was:

------------
p. 5: false convergence: the gradient ?f(x) may be computed
incorrectly, the other stopping tolerances may be too tight, or either
f or ?f may be discontinuous near the current iterate x.

p. 9: V(XFTOL) ? V(34) is the false-convergence tolerance. A return
with IV(1) = 8 occurs if a more favorable stopping test is not
satisfied and if a step of scaled length at most V(XFTOL) is tried but
not accepted. ??Scaled length?? is in the sense of (5.1). Such a
return generally means there is an error in computing ?f(x), or
the favorable convergence tolerances (V(RFCTOL), V(XCTOL), and
perhaps V(AFCTOL)) are too tight for the accuracy to which f(x) is
computed (see ?9), or ?f (or f itself) is discontinuous near x . An
error in computing ?f(x) usually leads to false convergence after
only a few iterations ? often in the first.  Default = 100*MACHEP.

p. 13: Sometimes evaluating f(x) involves an extensive computation,
such as performing a simulation or adaptive numerical quadrature or
integrating an ordinary or partial differential equation. In such
cases the value computed for f (x), say f?( x ), may involve
substantial error (in the eyes of the optimization algorithm).  To
eliminate some ??false convergence?? messages and useless function
evaluations, it is necessary to increase the stopping tolerances and,
when finite-difference derivative approximations are used, to increase
the step-sizes used in estimating derivatives.
----------

"evaluating f(x) involves an extensive computation" is a reasonably
good description of what's going on inside lme4 (although I think the
internal computations are _slightly_ less involved/noisy than a
typical ODE solution or generic integration by quadrature).

Are all of your covariates (year, site) unique, or can you collapse
the data to a binomial variable?  That might help a lot with both
speed and stability, and should be functionally equivalent ...

Observation-level random effects should have essentially no effect
for a Bernoulli variable.

nlminb() is notoriously cryptic/sensitive: it might be worth
checking the development version of lme4, which uses a more
robust optimizer by default (although you can also use nlminb(),
for backward comparison) and allows more control/investigation
of the optimization.

  Ben Bolker