Wrong convergence warnings with glmer in lme4 version 1.1-6??

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20140326/fb09af5d/attachment.pl>
Quick answer: these are probably false positives, driven by some
combination of the following:

 * we are using derivative-free methods (bobyqa, Nelder-Mead) rather
than derivative (or approximate-derivative-) based methods (nlminb,
L-BFGS-B) to optimize over the parameters.  Therefore, while using
approximate derivatives to assess convergence is a good idea, we don't
have any formal guarantee of how small the derivatives are supposed to
be (the stopping conditions for the derivative-free methods are not
based explicitly on derivatives, although they should of course be
related to the local derivatives); nor do we know how large a gradient
we should be worried about.
 * it's possible that the finite-difference approximations to the
gradients are themselves inaccurate
 * one flaw of our current approach is that we don't take singular fits
into account: that is, we include even the gradient elements for which
the parameters are up against a boundary (zero variances, or perfect
correlations). (That said, I've seen at least a few cases of relatively
large max-abs-grad values where the fit was *not* singular.)

  We (lme4 maintainers) are sorry for any inconvenience or worry, and
are working to resolve these issues.  However, we'd rather try to
understand what's going on here and convince ourselves that there is
*not* something worrisome going on, rather than just increase the
tolerances for the tests and make everything quiet again ...

  When I run your example (thanks for the reproducible example!) I get

 m1 at optinfo$derivs$gradient

 [1]  0.339026660 -0.008029951  0.145529325  0.093438612  0.123937398
 [6]  0.138252908  0.113161827  0.050304411  0.172043387  0.061015458
[11]  0.122968431  0.159325086  0.117684399  0.072351372  0.134278462
[16]  0.052197579 -0.001719254  0.109679304  0.199987785  0.129893883
[21]  0.138903688  0.024782721  0.047988965  0.028119425  0.096921221

Double-check internal gradient calculation with (slightly more
expensive, and non-boundary-respecting) numDeriv::grad

library(numDeriv)
dd <- update(m1,devFunOnly=TRUE)
grad(dd,unlist(getME(m1,c("theta","beta"))))

[1]  0.33902876 -0.00802679  0.14553678  0.09346970  0.12393419 0.13824924
 [7]  0.11335159  0.05030383  0.17205489  0.06102088  0.12296410  0.15933547
[13]  0.11768797  0.07235296  0.13426145  0.05219938 -0.00172568  0.10990842
[19]  0.19998764  0.12991478  0.13889906  0.02478346  0.04799346  0.02810442
[25]  0.09692275

  Looks sensible.

  I don't have more time to devote this right now, but my next steps
would/will be:

 * restart the optimization from the fitted optimum (possibly? using
refit(), or update(m1,start=getME(m1,c("theta","beta")))) and see if
that makes the max grad smaller
 * try a range of optimizers, especially nlminb, as described here:
http://stackoverflow.com/questions/21344555/convergence-error-for-development-version-of-lme4/21370041#21370041

  Ben Bolker
Dear lme4 experts,

Yesterday, I ran the code for two published papers (de Boeck et al.,2011;
de Boeck and Partchev, 2012) on psychometric modeling with glmer in lme4
version 1.1-6 and the vast majority of the models I ran produce convergence
warnings (even the simple ones).

For instance, the basic Rasch model in de Boeck et al. (2011) yields:

## our example data set is bundled with lme4
data("VerbAgg")

## A first example: A Rasch model with fixed item effects and random
person effects
m1 = glmer(r2 ~ 0 + item + (1|id), data=VerbAgg, family="binomial",
+ control=glmerControl(optCtrl=list(maxfun=100000)))
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.308607 (tol = 0.001)

I am a bit puzzeled because, to my knowledge, especially the models for the
VerAgg data (included in lme4) have been checked in many other programs
(also ltm in R) and I heard that glmer produces results that are valid and
consistent with SAS, HLM, ltm, and so on. However, this dataset produces
convergence warnings even though the models are comparatively simple for
psychometric research (basic Rasch and LLTM) and the estimates all seem
reasonable.

I also tried some datasets in the ltm package. Again, convergence warnings
(the mobility data). The estimates are close to what ltm gives me.

Even when I simulate data from a basic Rasch model using the rmvlogis
function in ltm with a couple of extreme item parameters (which occur in
psychometric tests), I also get these warnings. This happens despite glmer
seems to recover the true values very well. The "gradient" errors and the
hessian errors tend to increase with sample size and when I use
binomial("probit") instead of binomial("logit"). It also seems like vector
random effects produce many warnings. optimizer="bobqya" tends to reduce
the warnings but not consistently. To me, it seems like the warnings occur
randomly all over the place. Sometimes glmer "converges" (no warnings) with
one optimizer setting and the values that are very close to the true
values. However, with another optimizer setting, one gets practically
exactly the same estimates and virtually the same likelihood value but a
warning. I really do not understand what is going on.

I had no warnings using version 1.0-5 and version 1.0-6 so this seems to be
a recent problem of lme4?

Is it best to ignore all these convergence warnings for now? Should I
switch back to an older version of lme4 to avoid this problem? Should I
generally avoid using large datasets with lme4?

Many thanks in advance,
Tom

Papers (scripts are online):
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx,
F., & Partchev, I. (2011). The estimation of item response models with the
lmer function from the lme4 package in R. Journal of Statistical Software,
39, 1-28.  http://www.jstatsoft.org/v39/i12

De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response
models of the GLMM family. Journal of Statistical Software, 48, 1-28.
http://www.jstatsoft.org/v48/c01/

Simulated data:

library("reshape")
library("ltm")
library("lme4")
set.seed("12345")

simrasch<-data.frame(rmvlogis(200, cbind(seq(-5, 5, 0.5), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)

Coefficients:
    X1      X2      X3      X4      X5      X6      X7      X8      X9
X10     X11     X12
17.342   5.157   5.157   3.441   3.141   2.100   2.193   1.969   1.484
0.717   0.054  -0.347
   X13     X14     X15     X16     X17     X18     X19     X20
X21       z
-0.874  -1.415  -2.103  -2.696  -3.065  -3.152  -4.453  -4.216  -5.175
1.091

Log.Lik: -1329.186

simrasch$person = rownames(simrasch)
simraschlong = melt(simrasch, id = "person")
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
 2702.484  2842.026 -1329.242  2658.484      4178
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 1.091
Number of obs: 4200, groups:  person, 200
Fixed Effects:
 variableX1   variableX2   variableX3   variableX4   variableX5
variableX6   variableX7
   19.04000      5.14239      5.13514      3.43274      3.13736
2.10085      2.19335
 variableX8   variableX9  variableX10  variableX11  variableX12
variableX13  variableX14
    1.97086      1.48698      0.71828      0.05206     -0.35255
-0.88099     -1.42318
variableX15  variableX16  variableX17  variableX18  variableX19
variableX20  variableX21
   -2.11070     -2.70069     -3.06758     -3.15545     -4.44743
-4.21083     -5.16615
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.135068 (tol = 0.001)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
 2702.482  2842.025 -1329.241  2658.482      4178
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 1.09
Number of obs: 4200, groups:  person, 200
Fixed Effects:
 variableX1   variableX2   variableX3   variableX4   variableX5
variableX6   variableX7
   17.75458      5.14416      5.14417      3.43797      3.13991
2.10444      2.19676
 variableX8   variableX9  variableX10  variableX11  variableX12
variableX13  variableX14
    1.97412      1.48907      0.72049      0.05389     -0.34882
-0.87841     -1.42030
variableX15  variableX16  variableX17  variableX18  variableX19
variableX20  variableX21
   -2.10751     -2.69767     -3.06471     -3.15126     -4.44429
-4.20819     -5.16356

simrasch<-data.frame(rmvlogis(1000, cbind(seq(-5, 5, 0.5), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)

Coefficients:
    X1      X2      X3      X4      X5      X6      X7      X8      X9
X10     X11     X12
 5.195   4.422   3.868   3.599   3.192   2.558   2.211   1.605   0.984
0.559   0.003  -0.398
   X13     X14     X15     X16     X17     X18     X19     X20
X21       z
-0.985  -1.590  -1.994  -2.401  -2.876  -3.411  -3.815  -4.683  -4.832
1.014

Log.Lik: -6874.631

simrasch$person = rownames(simrasch)
simraschlong = melt(simrasch, id = "person")
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
13793.707 13968.657 -6874.853 13749.707     20978
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 1.014
Number of obs: 21000, groups:  person, 1000
Fixed Effects:
 variableX1   variableX2   variableX3   variableX4   variableX5
variableX6   variableX7
   5.182206     4.415991     3.871713     3.599747     3.191841
2.561370     2.219830
 variableX8   variableX9  variableX10  variableX11  variableX12
variableX13  variableX14
   1.611983     0.992610     0.564159     0.003716    -0.398537
-0.987771    -1.591732
variableX15  variableX16  variableX17  variableX18  variableX19
variableX20  variableX21
  -1.998437    -2.400455    -2.870716    -3.408691    -3.806151
-4.672642    -4.813643
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.907091 (tol = 0.001)
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
13793.696 13968.646 -6874.848 13749.696     20978
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 1.014
Number of obs: 21000, groups:  person, 1000
Fixed Effects:
 variableX1   variableX2   variableX3   variableX4   variableX5
variableX6   variableX7
   5.184774     4.414100     3.862784     3.594478     3.190096
2.559954     2.214893
 variableX8   variableX9  variableX10  variableX11  variableX12
variableX13  variableX14
   1.610161     0.988363     0.562450     0.002823    -0.400700
-0.990108    -1.595035
variableX15  variableX16  variableX17  variableX18  variableX19
variableX20  variableX21
  -1.997929    -2.403493    -2.875794    -3.408200    -3.809483
-4.674454    -4.822606
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00124419 (tol = 0.001)

set.seed("12345")
simrasch<-data.frame(rmvlogis(200, cbind(c(seq(-3, 3, 1),10), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)

Coefficients:
     X1       X2       X3       X4       X5       X6       X7
X8        z
  3.226    1.820    1.461    0.044   -0.706   -1.494   -2.771  -10.454
0.801

Log.Lik: -647.099

simrasch$person = rownames(simrasch)
simraschlong = melt(simrasch, id = "person")
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
1312.7956 1361.1955 -647.3978 1294.7956      1591
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 0.7881
Number of obs: 1600, groups:  person, 200
Fixed Effects:
variableX1  variableX2  variableX3  variableX4  variableX5  variableX6
variableX7  variableX8
   3.21214     1.82080     1.46473     0.04478    -0.71031    -1.49811
-2.76154   -27.10659
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
1312.7956 1361.1955 -647.3978 1294.7956      1591
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 0.7881
Number of obs: 1600, groups:  person, 200
Fixed Effects:
variableX1  variableX2  variableX3  variableX4  variableX5  variableX6
variableX7  variableX8
   3.21213     1.82080     1.46473     0.04478    -0.71032    -1.49811
-2.76154   -19.56371

set.seed("12345")
simrasch<-data.frame(rmvlogis(10000, cbind(c(seq(-3, 3, 1),10), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)

Coefficients:
    X1      X2      X3      X4      X5      X6      X7      X8       z
 3.018   2.026   1.011   0.012  -1.013  -1.941  -2.977  -9.696   0.986

Log.Lik: -31914.71

simrasch$person = rownames(simrasch)
simraschlong = melt(simrasch, id = "person")
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
 63888.47  63972.08 -31935.24  63870.47     79991
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 0.9739
Number of obs: 80000, groups:  person, 10000
Fixed Effects:
variableX1  variableX2  variableX3  variableX4  variableX5  variableX6
variableX7  variableX8
   3.01884     2.03562     1.02945     0.02299    -1.01066    -1.93548
-2.95234    -9.44503
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 25.6865 (tol = 0.001)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: value ~ 0 + variable + (1 | person)
   Data: simraschlong
      AIC       BIC    logLik  deviance  df.resid
 63887.88  63971.49 -31934.94  63869.88     79991
Random effects:
 Groups Name        Std.Dev.
 person (Intercept) 0.9737
Number of obs: 80000, groups:  person, 10000
Fixed Effects:
variableX1  variableX2  variableX3  variableX4  variableX5  variableX6
variableX7  variableX8
   3.00598     2.02857     1.01952     0.01191    -1.02146    -1.94484
-2.96534    -9.65211
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0128742 (tol = 0.001)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Ben Bolker <bbolker at ...> writes:

PS ...
  Looks sensible.

  I don't have more time to devote this right now, but my next steps
would/will be:

 * restart the optimization from the fitted optimum (possibly? using
refit(), or update(m1,start=getME(m1,c("theta","beta")))) and see if
that makes the max grad smaller
Steve Walker tried this, and it does 'work' -- successive refits
don't change the answer much, and the final gradient gets
progressively smaller (although it takes two refits to get below the
default test tolerance).  (The code above doesn't work as written
because the fixed effect component of the starting parameter list has
to be named 'fixef' -- we should change this to allow 'beta' as well
...)
 * try a range of optimizers, especially nlminb, as described here:
http://stackoverflow.com/questions/21344555/
convergence-error-for-development-version-of-lme4/21370041#21370041
[URL broken to make gmane happy]

I tried this with the full range of optimizers listed at that link.
*Only* the built-in Nelder-Mead has the problem; all other optimizers
(optimx + nlminb or L-BFGS-B; nloptr + Nelder-Mead or BOBYQA; built-in
bobyqa) get max(abs(gradient)) considerably less than the tolerance --
but the actual fitted model changes very little (the log-likelihood increases
by <0.01).

  So this does seem to be a false positive.  Still doesn't explain
why this is happening with Nelder-Mead, or under what circumstances
it's likely to happen (although big models do look prone to it).  We
should probably switch away from Nelder-Mead as the default throughout
(this was already done for lmer models in the last release, but not
for glmer), although I would love to do some more testing before jumping
out of the frying pan ...
  Ben Bolker