Hello!
I have a fairly complex multilevel, multivariate logistic model that I am trying to fit. In both models below, the variables injury, AMI, stroke, and resp are binary, as well as ALS and most other variables. There are a total of about 400,000 observations. When I try to fit the model (Original Model), I get several warnings, and I have pasted these below. I am largely concerned about number 4. I think this problem is due to having too many parameters in the model, and so I removed several interactions that were unnecessary anyway (Modified Model). I ran the Modified Model with a fixed number of iterations, and it finished these quickly enough (maybe 20 minutes?). But then it took another 19 hours to actually stop running, during which time I suspect R was doing various checks that led to the warnings. I'm not sure. When the Modified Model finished, it produced the warnings below.
My biggest problem right now is the amount of time it takes for R to stop running, even after restricting the number of iterations to 100. Because of this problem, it is impractical to try to figure out how to address the warnings.
Can somebody please help me figure out why R is taking so long, even after it has finished the 100 iterations? And what can I do about it?
Thank you!!
Prachi Sanghavi
Harvard University
Original Model and Warnings:
AMI_county_final_2 <- glmer(ALS ~ -1 + AMI + (injury + stroke + resp)*(FEMALE + AGE + MTUS_CNT + Asian + Black + Hispanic + Other + Custodial + Nursing + Scene + WhiteHigh + BlackHigh + BlackLow + IntegratedHigh + IntegratedLow + combinedscore + Year06 + Year07 + Year08 + Year09 + Year10 + Metro + Per_College_Plus + Per_Gen_Prac + Any_MedSchlAff + Any_Trauma) + (-1 + injury + AMI + stroke + resp | fullcounty), family=binomial, data=rbind(IARS,IARS2), verbose=2, control=glmerControl(optCtrl=list(maxfun=100)))
Warning messages:
1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf, :
failure to converge in 10000 evaluations
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 480.605 (tol = 0.001)
3: In if (resHess$code != 0) { :
the condition has length > 1 and only the first element will be used
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
Modified Model and Warnings:
AMI_county_final_2 <- glmer(ALS ~ -1 + Year06 + Year07 + Year08 + Year09 + Year10 + Metro + AMI + (injury + stroke + resp)*(FEMALE + AGE + MTUS_CNT + Asian + Black + Hispanic + Other + Custodial + Nursing + Scene + WhiteHigh + BlackHigh + BlackLow + IntegratedHigh + IntegratedLow + combinedscore) + (-1 + injury + AMI + stroke + resp | fullcounty), family=binomial, data=rbind(IARS,IARS2), verbose=2, control=glmerControl(optCtrl=list(maxfun=100)))
Warning messages:
1: In commonArgs(par, fn, control, environment()) :
maxfun < 10 * length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, start, rho$lower, control = control, :
convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf, :
failure to converge in 100 evaluations
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 15923.5 (tol = 0.001)
5: In if (resHess$code != 0) { :
the condition has length > 1 and only the first element will be used
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
glmer takes long time even after restricting iterations
3 messages · Prachi Sanghavi, Douglas Bates, Ben Bolker
I will take it as a compliment that you have sufficient confidence in our software to try to fit such a model. :-) Sadly, even with 400,000 observations it is highly unlikely you would be able to converge to parameter estimates for these modesl and even more unlikely that the estimates would be meaningful. The optimization in glmer is different than the optimization in lmer. For a linear mixed model the optimization is over the parameters in the relative covariance matrix only. In this case it looks like there would be 15 such parameters. The optimization problem involving even these parameters would be difficult, as it is likely that the solution will be on the boundary of the feasible region, representing a singular covariance matrix. For glmer the optimization is much more difficult because it is over the concatenation of the fixed-effects parameters and the covariance parameters. I lost track of what the number of fixed-effects parameters is but that number is large. As you have seen the first model failed to converge in 100,000 iterations. That is not encouraging. Regarding the warning messages I will let Ben or Steve respond as they know more about the convergence checks than I do. I believe those diagnostics involve creating a finite-difference approximation to the gradient vector and the Hessian matrix. The approximation of the Hessian matrix at the optimum is probably where the time is being spent. The best advice is to simplify the model. You say that ALS is a binary variable, which means that even with 400,000 observations you have only 400,000 bits of information to which to fit the model. That's not a lot. A continuous response provides much more information per observation than a binary response. Try to fit the fixed-effects only using glm. I'm confident that most of the coefficients will not be significant. On Fri, Sep 5, 2014 at 1:19 PM, Prachi Sanghavi <prachi.sanghavi at gmail.com> wrote:
Hello!
I have a fairly complex multilevel, multivariate logistic model that I am
trying to fit. In both models below, the variables injury, AMI, stroke,
and resp are binary, as well as ALS and most other variables. There are a
total of about 400,000 observations. When I try to fit the model (Original
Model), I get several warnings, and I have pasted these below. I am
largely concerned about number 4. I think this problem is due to having
too many parameters in the model, and so I removed several interactions
that were unnecessary anyway (Modified Model). I ran the Modified Model
with a fixed number of iterations, and it finished these quickly enough
(maybe 20 minutes?). But then it took another 19 hours to actually stop
running, during which time I suspect R was doing various checks that led to
the warnings. I'm not sure. When the Modified Model finished, it produced
the warnings below.
My biggest problem right now is the amount of time it takes for R to stop
running, even after restricting the number of iterations to 100. Because
of this problem, it is impractical to try to figure out how to address the
warnings.
Can somebody please help me figure out why R is taking so long, even after
it has finished the 100 iterations? And what can I do about it?
Thank you!!
Prachi Sanghavi
Harvard University
Original Model and Warnings:
AMI_county_final_2 <- glmer(ALS ~ -1 + AMI + (injury + stroke +
resp)*(FEMALE + AGE + MTUS_CNT + Asian + Black + Hispanic + Other +
Custodial + Nursing + Scene + WhiteHigh + BlackHigh + BlackLow +
IntegratedHigh + IntegratedLow + combinedscore + Year06 + Year07 + Year08 +
Year09 + Year10 + Metro + Per_College_Plus + Per_Gen_Prac + Any_MedSchlAff
+ Any_Trauma) + (-1 + injury + AMI + stroke + resp | fullcounty),
family=binomial, data=rbind(IARS,IARS2), verbose=2,
control=glmerControl(optCtrl=list(maxfun=100)))
Warning messages:
1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
:
failure to converge in 10000 evaluations
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 480.605 (tol = 0.001)
3: In if (resHess$code != 0) { :
the condition has length > 1 and only the first element will be used
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue
ratio
- Rescale variables?
Modified Model and Warnings:
AMI_county_final_2 <- glmer(ALS ~ -1 + Year06 + Year07 + Year08 + Year09 +
Year10 + Metro + AMI + (injury + stroke + resp)*(FEMALE + AGE + MTUS_CNT +
Asian + Black + Hispanic + Other + Custodial + Nursing + Scene + WhiteHigh
+ BlackHigh + BlackLow + IntegratedHigh + IntegratedLow + combinedscore) +
(-1 + injury + AMI + stroke + resp | fullcounty), family=binomial,
data=rbind(IARS,IARS2), verbose=2,
control=glmerControl(optCtrl=list(maxfun=100)))
Warning messages:
1: In commonArgs(par, fn, control, environment()) :
maxfun < 10 * length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, start, rho$lower, control = control, :
convergence code 1 from bobyqa: bobyqa -- maximum number of function
evaluations exceeded
3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
:
failure to converge in 100 evaluations
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 15923.5 (tol = 0.001)
5: In if (resHess$code != 0) { :
the condition has length > 1 and only the first element will be used
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue
ratio
- Rescale variables?
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On 14-09-05 05:05 PM, Douglas Bates wrote:
I will take it as a compliment that you have sufficient confidence in our software to try to fit such a model. :-) Sadly, even with 400,000 observations it is highly unlikely you would be able to converge to parameter estimates for these modesl and even more unlikely that the estimates would be meaningful. The optimization in glmer is different than the optimization in lmer. For a linear mixed model the optimization is over the parameters in the relative covariance matrix only. In this case it looks like there would be 15 such parameters. The optimization problem involving even these parameters would be difficult, as it is likely that the solution will be on the boundary of the feasible region, representing a singular covariance matrix. For glmer the optimization is much more difficult because it is over the concatenation of the fixed-effects parameters and the covariance parameters. I lost track of what the number of fixed-effects parameters is but that number is large. As you have seen the first model failed to converge in 100,000 iterations. That is not encouraging. Regarding the warning messages I will let Ben or Steve respond as they know more about the convergence checks than I do. I believe those diagnostics involve creating a finite-difference approximation to the gradient vector and the Hessian matrix. The approximation of the Hessian matrix at the optimum is probably where the time is being spent.
For speeding things up I would try setting nAGQ=0, and setting
control=glmerControl(check.conv.grad="ignore",check.conv.singular="ignore",
check.conv.hess="ignore")
-- this should deactivate the Hessian and gradient computations
(although at some point you will probably want to go back to testing these!)
It looks like you have 79 fixed-effect parameters, plus what looks
like 10 random-effect parameters (this is a quick count, and assumes
that all your variables are numeric) -- this means that the Hessian
computation will have to do approximately 4000 (n*(n+1)/2) function
evaluations ...
You can also try using the bobyqa implementation from nloptr, with
appropriate convergence settings, as described here:
https://github.com/lme4/lme4/issues/150#issuecomment-45813306
I believe these are the same settings that are implemented in ?nloptwrap.
The best advice is to simplify the model. You say that ALS is a binary variable, which means that even with 400,000 observations you have only 400,000 bits of information to which to fit the model. That's not a lot. A continuous response provides much more information per observation than a binary response. Try to fit the fixed-effects only using glm. I'm confident that most of the coefficients will not be significant. On Fri, Sep 5, 2014 at 1:19 PM, Prachi Sanghavi <prachi.sanghavi at gmail.com> wrote:
Hello! I have a fairly complex multilevel, multivariate logistic model that I am trying to fit. In both models below, the variables injury, AMI, stroke, and resp are binary, as well as ALS and most other variables. There are a total of about 400,000 observations. When I try to fit the model (Original Model), I get several warnings, and I have pasted these below. I am largely concerned about number 4. I think this problem is due to having too many parameters in the model, and so I removed several interactions that were unnecessary anyway (Modified Model). I ran the Modified Model with a fixed number of iterations, and it finished these quickly enough (maybe 20 minutes?). But then it took another 19 hours to actually stop running, during which time I suspect R was doing various checks that led to the warnings. I'm not sure. When the Modified Model finished, it produced the warnings below. My biggest problem right now is the amount of time it takes for R to stop running, even after restricting the number of iterations to 100. Because of this problem, it is impractical to try to figure out how to address the warnings. Can somebody please help me figure out why R is taking so long, even after it has finished the 100 iterations? And what can I do about it? Thank you!! Prachi Sanghavi Harvard University Original Model and Warnings:
AMI_county_final_2 <- glmer(ALS ~ -1 + AMI + (injury + stroke +
resp)*(FEMALE + AGE + MTUS_CNT + Asian + Black + Hispanic + Other +
Custodial + Nursing + Scene + WhiteHigh + BlackHigh + BlackLow +
IntegratedHigh + IntegratedLow + combinedscore + Year06 + Year07 + Year08 +
Year09 + Year10 + Metro + Per_College_Plus + Per_Gen_Prac + Any_MedSchlAff
+ Any_Trauma) + (-1 + injury + AMI + stroke + resp | fullcounty),
family=binomial, data=rbind(IARS,IARS2), verbose=2,
control=glmerControl(optCtrl=list(maxfun=100)))
Warning messages:
1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
:
failure to converge in 10000 evaluations
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 480.605 (tol = 0.001)
3: In if (resHess$code != 0) { :
the condition has length > 1 and only the first element will be used
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue
ratio
- Rescale variables?
Modified Model and Warnings:
AMI_county_final_2 <- glmer(ALS ~ -1 + Year06 + Year07 + Year08 + Year09 +
Year10 + Metro + AMI + (injury + stroke + resp)*(FEMALE + AGE + MTUS_CNT +
Asian + Black + Hispanic + Other + Custodial + Nursing + Scene + WhiteHigh
+ BlackHigh + BlackLow + IntegratedHigh + IntegratedLow + combinedscore) +
(-1 + injury + AMI + stroke + resp | fullcounty), family=binomial,
data=rbind(IARS,IARS2), verbose=2,
control=glmerControl(optCtrl=list(maxfun=100)))
Warning messages:
1: In commonArgs(par, fn, control, environment()) :
maxfun < 10 * length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, start, rho$lower, control = control, :
convergence code 1 from bobyqa: bobyqa -- maximum number of function
evaluations exceeded
3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
:
failure to converge in 100 evaluations
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 15923.5 (tol = 0.001)
5: In if (resHess$code != 0) { :
the condition has length > 1 and only the first element will be used
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue
ratio
- Rescale variables?
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models