An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20140326/fb09af5d/attachment.pl>
Wrong convergence warnings with glmer in lme4 version 1.1-6??
3 messages · Tom Davis, Ben Bolker
Quick answer: these are probably false positives, driven by some
combination of the following:
* we are using derivative-free methods (bobyqa, Nelder-Mead) rather
than derivative (or approximate-derivative-) based methods (nlminb,
L-BFGS-B) to optimize over the parameters. Therefore, while using
approximate derivatives to assess convergence is a good idea, we don't
have any formal guarantee of how small the derivatives are supposed to
be (the stopping conditions for the derivative-free methods are not
based explicitly on derivatives, although they should of course be
related to the local derivatives); nor do we know how large a gradient
we should be worried about.
* it's possible that the finite-difference approximations to the
gradients are themselves inaccurate
* one flaw of our current approach is that we don't take singular fits
into account: that is, we include even the gradient elements for which
the parameters are up against a boundary (zero variances, or perfect
correlations). (That said, I've seen at least a few cases of relatively
large max-abs-grad values where the fit was *not* singular.)
We (lme4 maintainers) are sorry for any inconvenience or worry, and
are working to resolve these issues. However, we'd rather try to
understand what's going on here and convince ourselves that there is
*not* something worrisome going on, rather than just increase the
tolerances for the tests and make everything quiet again ...
When I run your example (thanks for the reproducible example!) I get
m1 at optinfo$derivs$gradient
[1] 0.339026660 -0.008029951 0.145529325 0.093438612 0.123937398
[6] 0.138252908 0.113161827 0.050304411 0.172043387 0.061015458
[11] 0.122968431 0.159325086 0.117684399 0.072351372 0.134278462
[16] 0.052197579 -0.001719254 0.109679304 0.199987785 0.129893883
[21] 0.138903688 0.024782721 0.047988965 0.028119425 0.096921221
Double-check internal gradient calculation with (slightly more
expensive, and non-boundary-respecting) numDeriv::grad
library(numDeriv)
dd <- update(m1,devFunOnly=TRUE)
grad(dd,unlist(getME(m1,c("theta","beta"))))
[1] 0.33902876 -0.00802679 0.14553678 0.09346970 0.12393419 0.13824924
[7] 0.11335159 0.05030383 0.17205489 0.06102088 0.12296410 0.15933547
[13] 0.11768797 0.07235296 0.13426145 0.05219938 -0.00172568 0.10990842
[19] 0.19998764 0.12991478 0.13889906 0.02478346 0.04799346 0.02810442
[25] 0.09692275
Looks sensible.
I don't have more time to devote this right now, but my next steps
would/will be:
* restart the optimization from the fitted optimum (possibly? using
refit(), or update(m1,start=getME(m1,c("theta","beta")))) and see if
that makes the max grad smaller
* try a range of optimizers, especially nlminb, as described here:
http://stackoverflow.com/questions/21344555/convergence-error-for-development-version-of-lme4/21370041#21370041
Ben Bolker
On 14-03-26 02:19 PM, Tom Davis wrote:
Dear lme4 experts, Yesterday, I ran the code for two published papers (de Boeck et al.,2011; de Boeck and Partchev, 2012) on psychometric modeling with glmer in lme4 version 1.1-6 and the vast majority of the models I ran produce convergence warnings (even the simple ones). For instance, the basic Rasch model in de Boeck et al. (2011) yields:
## our example data set is bundled with lme4
data("VerbAgg")
## A first example: A Rasch model with fixed item effects and random
person effects
m1 = glmer(r2 ~ 0 + item + (1|id), data=VerbAgg, family="binomial",
+ control=glmerControl(optCtrl=list(maxfun=100000)))
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.308607 (tol = 0.001)
I am a bit puzzeled because, to my knowledge, especially the models for the
VerAgg data (included in lme4) have been checked in many other programs
(also ltm in R) and I heard that glmer produces results that are valid and
consistent with SAS, HLM, ltm, and so on. However, this dataset produces
convergence warnings even though the models are comparatively simple for
psychometric research (basic Rasch and LLTM) and the estimates all seem
reasonable.
I also tried some datasets in the ltm package. Again, convergence warnings
(the mobility data). The estimates are close to what ltm gives me.
Even when I simulate data from a basic Rasch model using the rmvlogis
function in ltm with a couple of extreme item parameters (which occur in
psychometric tests), I also get these warnings. This happens despite glmer
seems to recover the true values very well. The "gradient" errors and the
hessian errors tend to increase with sample size and when I use
binomial("probit") instead of binomial("logit"). It also seems like vector
random effects produce many warnings. optimizer="bobqya" tends to reduce
the warnings but not consistently. To me, it seems like the warnings occur
randomly all over the place. Sometimes glmer "converges" (no warnings) with
one optimizer setting and the values that are very close to the true
values. However, with another optimizer setting, one gets practically
exactly the same estimates and virtually the same likelihood value but a
warning. I really do not understand what is going on.
I had no warnings using version 1.0-5 and version 1.0-6 so this seems to be
a recent problem of lme4?
Is it best to ignore all these convergence warnings for now? Should I
switch back to an older version of lme4 to avoid this problem? Should I
generally avoid using large datasets with lme4?
Many thanks in advance,
Tom
Papers (scripts are online):
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx,
F., & Partchev, I. (2011). The estimation of item response models with the
lmer function from the lme4 package in R. Journal of Statistical Software,
39, 1-28. http://www.jstatsoft.org/v39/i12
De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response
models of the GLMM family. Journal of Statistical Software, 48, 1-28.
http://www.jstatsoft.org/v48/c01/
Simulated data:
library("reshape")
library("ltm")
library("lme4")
set.seed("12345")
simrasch<-data.frame(rmvlogis(200, cbind(seq(-5, 5, 0.5), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)
Coefficients:
X1 X2 X3 X4 X5 X6 X7 X8 X9
X10 X11 X12
17.342 5.157 5.157 3.441 3.141 2.100 2.193 1.969 1.484
0.717 0.054 -0.347
X13 X14 X15 X16 X17 X18 X19 X20
X21 z
-0.874 -1.415 -2.103 -2.696 -3.065 -3.152 -4.453 -4.216 -5.175
1.091
Log.Lik: -1329.186
simrasch$person = rownames(simrasch) simraschlong = melt(simrasch, id = "person") glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
2702.484 2842.026 -1329.242 2658.484 4178
Random effects:
Groups Name Std.Dev.
person (Intercept) 1.091
Number of obs: 4200, groups: person, 200
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5
variableX6 variableX7
19.04000 5.14239 5.13514 3.43274 3.13736
2.10085 2.19335
variableX8 variableX9 variableX10 variableX11 variableX12
variableX13 variableX14
1.97086 1.48698 0.71828 0.05206 -0.35255
-0.88099 -1.42318
variableX15 variableX16 variableX17 variableX18 variableX19
variableX20 variableX21
-2.11070 -2.70069 -3.06758 -3.15545 -4.44743
-4.21083 -5.16615
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.135068 (tol = 0.001)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
2702.482 2842.025 -1329.241 2658.482 4178
Random effects:
Groups Name Std.Dev.
person (Intercept) 1.09
Number of obs: 4200, groups: person, 200
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5
variableX6 variableX7
17.75458 5.14416 5.14417 3.43797 3.13991
2.10444 2.19676
variableX8 variableX9 variableX10 variableX11 variableX12
variableX13 variableX14
1.97412 1.48907 0.72049 0.05389 -0.34882
-0.87841 -1.42030
variableX15 variableX16 variableX17 variableX18 variableX19
variableX20 variableX21
-2.10751 -2.69767 -3.06471 -3.15126 -4.44429
-4.20819 -5.16356
simrasch<-data.frame(rmvlogis(1000, cbind(seq(-5, 5, 0.5), 1))) rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)
Coefficients:
X1 X2 X3 X4 X5 X6 X7 X8 X9
X10 X11 X12
5.195 4.422 3.868 3.599 3.192 2.558 2.211 1.605 0.984
0.559 0.003 -0.398
X13 X14 X15 X16 X17 X18 X19 X20
X21 z
-0.985 -1.590 -1.994 -2.401 -2.876 -3.411 -3.815 -4.683 -4.832
1.014
Log.Lik: -6874.631
simrasch$person = rownames(simrasch) simraschlong = melt(simrasch, id = "person") glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
13793.707 13968.657 -6874.853 13749.707 20978
Random effects:
Groups Name Std.Dev.
person (Intercept) 1.014
Number of obs: 21000, groups: person, 1000
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5
variableX6 variableX7
5.182206 4.415991 3.871713 3.599747 3.191841
2.561370 2.219830
variableX8 variableX9 variableX10 variableX11 variableX12
variableX13 variableX14
1.611983 0.992610 0.564159 0.003716 -0.398537
-0.987771 -1.591732
variableX15 variableX16 variableX17 variableX18 variableX19
variableX20 variableX21
-1.998437 -2.400455 -2.870716 -3.408691 -3.806151
-4.672642 -4.813643
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.907091 (tol = 0.001)
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
13793.696 13968.646 -6874.848 13749.696 20978
Random effects:
Groups Name Std.Dev.
person (Intercept) 1.014
Number of obs: 21000, groups: person, 1000
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5
variableX6 variableX7
5.184774 4.414100 3.862784 3.594478 3.190096
2.559954 2.214893
variableX8 variableX9 variableX10 variableX11 variableX12
variableX13 variableX14
1.610161 0.988363 0.562450 0.002823 -0.400700
-0.990108 -1.595035
variableX15 variableX16 variableX17 variableX18 variableX19
variableX20 variableX21
-1.997929 -2.403493 -2.875794 -3.408200 -3.809483
-4.674454 -4.822606
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00124419 (tol = 0.001)
set.seed("12345")
simrasch<-data.frame(rmvlogis(200, cbind(c(seq(-3, 3, 1),10), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)
Coefficients:
X1 X2 X3 X4 X5 X6 X7
X8 z
3.226 1.820 1.461 0.044 -0.706 -1.494 -2.771 -10.454
0.801
Log.Lik: -647.099
simrasch$person = rownames(simrasch) simraschlong = melt(simrasch, id = "person") glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
1312.7956 1361.1955 -647.3978 1294.7956 1591
Random effects:
Groups Name Std.Dev.
person (Intercept) 0.7881
Number of obs: 1600, groups: person, 200
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5 variableX6
variableX7 variableX8
3.21214 1.82080 1.46473 0.04478 -0.71031 -1.49811
-2.76154 -27.10659
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
1312.7956 1361.1955 -647.3978 1294.7956 1591
Random effects:
Groups Name Std.Dev.
person (Intercept) 0.7881
Number of obs: 1600, groups: person, 200
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5 variableX6
variableX7 variableX8
3.21213 1.82080 1.46473 0.04478 -0.71032 -1.49811
-2.76154 -19.56371
set.seed("12345")
simrasch<-data.frame(rmvlogis(10000, cbind(c(seq(-3, 3, 1),10), 1)))
rasch(simrasch,IRT.param=F)
Call:
rasch(data = simrasch, IRT.param = F)
Coefficients:
X1 X2 X3 X4 X5 X6 X7 X8 z
3.018 2.026 1.011 0.012 -1.013 -1.941 -2.977 -9.696 0.986
Log.Lik: -31914.71
simrasch$person = rownames(simrasch) simraschlong = melt(simrasch, id = "person") glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
63888.47 63972.08 -31935.24 63870.47 79991
Random effects:
Groups Name Std.Dev.
person (Intercept) 0.9739
Number of obs: 80000, groups: person, 10000
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5 variableX6
variableX7 variableX8
3.01884 2.03562 1.02945 0.02299 -1.01066 -1.93548
-2.95234 -9.44503
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 25.6865 (tol = 0.001)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
glmer(value ~ 0 + variable + (1|person), data=simraschlong,
family=binomial,
+ control=glmerControl(optimizer="bobyqa"))
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: value ~ 0 + variable + (1 | person)
Data: simraschlong
AIC BIC logLik deviance df.resid
63887.88 63971.49 -31934.94 63869.88 79991
Random effects:
Groups Name Std.Dev.
person (Intercept) 0.9737
Number of obs: 80000, groups: person, 10000
Fixed Effects:
variableX1 variableX2 variableX3 variableX4 variableX5 variableX6
variableX7 variableX8
3.00598 2.02857 1.01952 0.01191 -1.02146 -1.94484
-2.96534 -9.65211
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0128742 (tol = 0.001)
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker <bbolker at ...> writes: PS ...
Looks sensible.
I don't have more time to devote this right now, but my next steps
would/will be:
* restart the optimization from the fitted optimum (possibly? using
refit(), or update(m1,start=getME(m1,c("theta","beta")))) and see if
that makes the max grad smaller
Steve Walker tried this, and it does 'work' -- successive refits don't change the answer much, and the final gradient gets progressively smaller (although it takes two refits to get below the default test tolerance). (The code above doesn't work as written because the fixed effect component of the starting parameter list has to be named 'fixef' -- we should change this to allow 'beta' as well ...)
* try a range of optimizers, especially nlminb, as described here: http://stackoverflow.com/questions/21344555/
convergence-error-for-development-version-of-lme4/21370041#21370041 [URL broken to make gmane happy]
I tried this with the full range of optimizers listed at that link. *Only* the built-in Nelder-Mead has the problem; all other optimizers (optimx + nlminb or L-BFGS-B; nloptr + Nelder-Mead or BOBYQA; built-in bobyqa) get max(abs(gradient)) considerably less than the tolerance -- but the actual fitted model changes very little (the log-likelihood increases by <0.01). So this does seem to be a false positive. Still doesn't explain why this is happening with Nelder-Mead, or under what circumstances it's likely to happen (although big models do look prone to it). We should probably switch away from Nelder-Mead as the default throughout (this was already done for lmer models in the last release, but not for glmer), although I would love to do some more testing before jumping out of the frying pan ...
Ben Bolker