False convergence
Nick Isaac <njbisaac at ...> writes:
Dear list,
I'm running a set of models in glmer(), some of which return the 'false
convergence' error (cvg=8). I'm trying to understand why.
My models all have the same basic structure: glmer(P ~ Year + (1|Site),
binomial), where P is a vector of 0s and 1s. The Year is centered on zero,
which I've found to greatly reduce the incidence of false convergences in
the past. There are ~100,000 observations and 9000 sites.
I've tried a couple of fixes, including the .Call("mer_optimize",...) hack
as well as an observation-level random effect. the former has no impact on
the parameter estimates and the latter still returns the false convergence
warning.
I'm using verbose=T argument. I've noticed previously that false
convergence is characterised by just 1 or 2 iterations being completed and
variances on the random effects that are either close to zero or
astronomically huge. But my models are running to dozens of iterations with
sensible looking variances that change moderately among iterations but then
stabilise during the last few iterations. And the parameter estimates look
sensible.
In other words, the models do not show any evidence of failure, except the
warning message. So which should I believe: the verbose trace or the
warning message?
Perhaps someone can give me further insight into why glmer() thinks the
model has not properly converged
In general I would believe the verbose trace ... The stable version of lme4 is using the nlminb() optimizer internally, which in turn is based on the PORT libraries The docs linked from ?nlminb: http://netlib.bell-labs.com/cm/cs/cstr/153.pdf The only useful material I could find in these docs was: ------------ p. 5: false convergence: the gradient ?f(x) may be computed incorrectly, the other stopping tolerances may be too tight, or either f or ?f may be discontinuous near the current iterate x. p. 9: V(XFTOL) ? V(34) is the false-convergence tolerance. A return with IV(1) = 8 occurs if a more favorable stopping test is not satisfied and if a step of scaled length at most V(XFTOL) is tried but not accepted. ??Scaled length?? is in the sense of (5.1). Such a return generally means there is an error in computing ?f(x), or the favorable convergence tolerances (V(RFCTOL), V(XCTOL), and perhaps V(AFCTOL)) are too tight for the accuracy to which f(x) is computed (see ?9), or ?f (or f itself) is discontinuous near x . An error in computing ?f(x) usually leads to false convergence after only a few iterations ? often in the first. Default = 100*MACHEP. p. 13: Sometimes evaluating f(x) involves an extensive computation, such as performing a simulation or adaptive numerical quadrature or integrating an ordinary or partial differential equation. In such cases the value computed for f (x), say f?( x ), may involve substantial error (in the eyes of the optimization algorithm). To eliminate some ??false convergence?? messages and useless function evaluations, it is necessary to increase the stopping tolerances and, when finite-difference derivative approximations are used, to increase the step-sizes used in estimating derivatives. ---------- "evaluating f(x) involves an extensive computation" is a reasonably good description of what's going on inside lme4 (although I think the internal computations are _slightly_ less involved/noisy than a typical ODE solution or generic integration by quadrature). Are all of your covariates (year, site) unique, or can you collapse the data to a binomial variable? That might help a lot with both speed and stability, and should be functionally equivalent ... Observation-level random effects should have essentially no effect for a Bernoulli variable. nlminb() is notoriously cryptic/sensitive: it might be worth checking the development version of lme4, which uses a more robust optimizer by default (although you can also use nlminb(), for backward comparison) and allows more control/investigation of the optimization. Ben Bolker