Skip to content

lme4, failure to converge with a range of optimisers, trust the fitted model anyway?

16 messages · Ben Bolker, Ken Beath, Ben Pelzer +1 more

#
Dear list,

I know, the failure to converge problem is boring, but still I would
like your input on my situation.

I have tried four optimizers/methods, and they all fail; glmer used.

1. Nelder_Mead: Model failed to converge with max|grad| = 0.00116526
   (tol = 0.001, component 6)

2. bobyqa: Model failed to converge with max|grad| = 0.00117064 (tol =
   0.001, component 7)

3. optimx, nlminb: Model failed to converge: degenerate Hessian with 4
   negative eigenvalues

4. optimx, L-BFGS-B: Model failed to converge with max|grad| =
   0.012963 (tol = 0.001, component 7)" "Model failed to converge:
   degenerate Hessian with 3 negative eigenvalues

The sample size is large: 1.833.793

The estimates resulting from fitting the model to data with the
different optimizers are similar:

                                     NM     bobyqa      nlmin       BFGS
(Intercept)                   4.9857379  3.7283744  4.9477121  3.2138480
QoG                           0.7866227  0.5962816  0.7534208  0.5991817
GDPLog                       -1.5161825 -1.3643422 -1.5111097 -1.2940261
Ruralyes                      4.3436228  4.3422641  4.3419199  4.3415551
KilledPerMillion5Log          0.6632158  0.6005677  0.6276264  0.5216984
Ruralyes:KilledPerMillion5Log 0.7313136  0.7316543  0.7314746  0.7329137

My theoretical focus is on the last two rows.

1. Is this likely to be a false positive? I'm willing to share data if
   that can help the development of lme4.

2. If the fits are bad, then what are my alternatives? continue with
   glmer and increase nAGQ? Other ideas? Or do I need to use other
   packages? I really love lme4, so I hope this will not be necessary.


Kind regards,

Hans Ekbrand







Postscript.

Here is the output of summary for the Nelder_Mead fit:

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: SanitationDeprivation ~ (1 | Country) + (1 | ClusterID) + QoG +  
    GDPLog + Rural * KilledPerMillion5Log
   Data: my.df.aid

      AIC       BIC    logLik  deviance  df.resid 
1013298.5 1013397.9 -506641.2 1013282.5   1833785 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-9.6390 -0.2883 -0.0508  0.0666 14.6730 

Random effects:
 Groups    Name        Variance Std.Dev.
 ClusterID (Intercept)  9.675   3.110   
 Country   (Intercept) 11.115   3.334   
Number of obs: 1833793, groups:  ClusterID, 38177; Country, 65

Fixed effects:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    4.98574    0.08764   56.89  < 2e-16 ***
QoG                            0.78662    0.13190    5.96 2.46e-09 ***
GDPLog                        -1.51618    0.04841  -31.32  < 2e-16 ***
Ruralyes                       4.34362    0.04655   93.31  < 2e-16 ***
KilledPerMillion5Log           0.66322    0.15070    4.40 1.08e-05 ***
Ruralyes:KilledPerMillion5Log  0.73131    0.06059   12.07  < 2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
            (Intr) QoG    GDPLog Rurlys KlPM5L
QoG         -0.042                            
GDPLog      -0.153  0.129                     
Ruralyes    -0.020  0.067 -0.037              
KlldPrMll5L -0.081 -0.113 -0.091 -0.131       
Rrlys:KPM5L -0.007 -0.088 -0.044 -0.426  0.205
#
One of the problems is that you have a relatively high random effects
variance. A standard deviation of the intercept of 3 is a huge amount, it
means that there is massive variation in the random effect value needed to
model each cluster, to the point that some clusters will be all zeros and
some will be all ones. In this situation the assumption of approximate
normality of the likelihood around the nodes which is required for using
Laplace's method is very far from met.

I would find a spare computer and increase nAGQ to say 5. It might take a
while to run but hopefully it will be enough to make it converge. Then
increase nAGQ until the logLikelihood doesn't change. I have a preference
for nlminb.

Programs that do random effects logistic with more than one random effect
are scarce. I can try Latent Gold with Syntax Module but I'm not certain
what limit it has on number of observations.
On 4 April 2015 at 20:29, Hans Ekbrand <hans.ekbrand at gmail.com> wrote:

            

  
    
#
On Sat, Apr 04, 2015 at 09:10:35PM +1100, Ken Beath wrote:
Thanks for your advice, I really appreciate it!

I tried nAGQ=5, but met with:

## Error: nAGQ > 1 is only available for models with a single, scalar
random-effects term

As you point out, some clusters will be all zeros (and some will be
all ones). While my data is on the individual level,

a) the variable I'm mainly interested in, KilledPerMillion5Log, varies
only at the country level, &

b) I currently have no variables in the model that vary at the
individual level

So, perhaps I could aggregate the individual level data to the cluster
level, and do without the random term for cluster? I mean, calculate
the proportions of yes in each cluster and use that as the dependent
variable.

This would, I assume, require that each cluster was given a weight
that corresponded to the number of individuals in it - or I would not
be able to say anything about probabilities at the individual level,
right?
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15-04-04 02:15 PM, Hans Ekbrand wrote:
I would say pretty much any time you can aggregate without losing
information, you should, for ease of computation (this accords with
the Murtaugh 2007 "Simplicity and complexity" paper that I cite here
on a regular basis).  If all of your covariates are at the cluster
level, then this reduces to a binomial GLM (you can account for
overdispersion either by using a quasi-binomial model in glm() or by
staying with glmer() and keeping the random effect of cluster (now an
observation-level random effect).

  Ben Bolker


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVIHVjAAoJEOCV5YRblxUHUYsH/243gzJN9hWayUICMmQop6iZ
+p+RIxh3lpSmB1lnRX+VXpghNpB4a5s98m+q5HZJB2eUyBCJ7pwXbHuTaZZ2t2D0
v6V/Ih2VeoqnbVqaSrPDNgDufNO1RhHlc4h6L0bIGf/etmOHgtHaPnly5JUXUmq4
QZdt5Sxotqshdbqf+4FvMNFVa1cNWJNLBY0XB2GecSLa8MI7vF1UHvlqdtDWIO9e
WalpfJq6C4oqIyz2kCsZWRijSHroFpPJ6eMlKmbLFLxfY6rP5dQZK11ktsANQ7lu
bnPHoF1Xzwnl87XDl/v6K9P1ge2x7bvihDUVodjBoK31CHc8UAYvh/5Ud4AGKoY=
=9xK3
-----END PGP SIGNATURE-----
#
On Sat, Apr 04, 2015 at 07:36:04PM -0400, Ben Bolker wrote:
[ Thanks for your advice, it's much appreciated. ]

The covariates are not all at the cluster level. Actually only one
covariate is at the cluster level, the others are at the country
level.
Since I still have the country level to account for, I need to use glmer().

I have created an aggregated data set with a variable for the
proportion deprived=TRUE in each cluster, and a variable for the
number of cases in that cluster (to use as weight).

Perhaps this is not a question specific for mixed models, but what
family and link function would be appropriate now? This was the best I
could come up with:

library(car)
lm1 <- lmer(logit(Prop.deprived.in.cluster) ~ (1|Country) + QoG + GDPLog + Rural * KilledPerMillion5Log, data = my.small.df, weights = my.small.df$weight)

Linear mixed model fit by REML ['lmerMod']
Formula: logit(Prop.deprived.in.cluster) ~ (1 | Country) + QoG + GDPLog +  
    Rural * KilledPerMillion5Log
   Data: my.small.df
Weights: my.small.df$weight

REML criterion at convergence: 157709.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.9983 -0.5626 -0.0622  0.4371  6.1171 

Random effects:
 Groups   Name        Variance Std.Dev.
 Country  (Intercept)   1.768   1.33   
 Residual             139.292  11.80   
Number of obs: 38028, groups:  Country, 65

Fixed effects:
                              Estimate Std. Error t value
(Intercept)                    1.25542    1.90567    0.66
QoG                            0.39120    0.40850    0.96
GDPLog                        -0.51799    0.23031   -2.25
Ruralyes                       1.91545    0.02232   85.80
KilledPerMillion5Log           0.06190    0.28557    0.22
Ruralyes:KilledPerMillion5Log  0.35303    0.03728    9.47

Correlation of Fixed Effects:
            (Intr) QoG    GDPLog Rurlys KlPM5L
QoG          0.571                            
GDPLog      -0.989 -0.486                     
Ruralyes    -0.013  0.001  0.006              
KlldPrMll5L -0.013  0.239 -0.009  0.045       
Rrlys:KPM5L  0.004 -0.001  0.000 -0.519 -0.084

my.small.df is available here if someone wants to have a go at it

http://hansekbrand.se/code/my.small.df.RData (1.1 MB)
#
No, you need to treat this still as binomial data, using cbind(y,n-y) as
the response where y is the number of positives in each group, and n is the
total in each group. I suggest reading one of the books that discusses
fitting logistic models in R, most advanced texts have a
section. Introductory Statistics with R by Peter Dalgaard has the section
available in Amazon.

You also still need a random effect for the cluster.

While I'm thinking of it, should clusters and country random effects have
been crossed. Generally the sampling is setup so that clusters are nested
within countries which requires a different syntax.
On 5 April 2015 at 10:41, Hans Ekbrand <hans.ekbrand at gmail.com> wrote:

            

  
    
#
Dear Hans,

You could try Ben's suggestion for nAGQ = 5 in Stata. I believe that the 
routine"s "xtlogit" and "xtmelogit"  (in Stata 13 they are named 
differently, I don't remember the names right now)  are able do work 
with nested random effects and more than 1 quadr. point. These routines 
are however pretty time-consuming, may be even more so than glmer. Best 
regards,

Ben.
On 4-4-2015 20:15, Hans Ekbrand wrote:
#
On Sun, Apr 05, 2015 at 07:31:25PM +1000, Ken Beath wrote:
OK, I'll try that. What is the interpretion of the outcome in this
case, is it still the logit of the probability of the outcome?
I actually already have that one, quite good.
I'm sorry but I haven't been clear on this, but the clusters are
nested within countries, so there are no crossed random effects to be
found.
#
On Sun, Apr 05, 2015 at 11:33:26AM +0200, Ben Pelzer wrote:
OK. Even if will handle this case with aggregation, it is good to know
what stata can do for future challenges.
#
On 5 April 2015 at 20:12, Hans Ekbrand <hans.ekbrand at gmail.com> wrote:

            
Yes.
The  random effect then needs to be included as (1|country/clusterID)
#
Well, no, cluster is implictly nested in country, just as sample is
implicitly nested in batch in the Pastes data (see p 39-40 in
http://lme4.r-forge.r-project.org/book/Ch2.pdf)
#
On Sun, Apr 05, 2015 at 01:56:03PM +0200, Hans Ekbrand wrote:
Sorry if this was unclear, but the ClusterID variable was created in a
similar way that Bates creates the sample variable on page 40 in that
text. [ English is not my native language ]
#
OK then, the way you were doing it is OK.
On 5 April 2015 at 21:58, Hans Ekbrand <hans.ekbrand at gmail.com> wrote:

            

  
    
#
On Sun, Apr 05, 2015 at 07:31:25PM +1000, Ken Beath wrote:
I think I've just stumbled into something that may deepen my
understanding of mixed-models, thanks to you.

I took your advice on how to create the dependent variable

cbind(y, n-y)

and included the random term for cluster,

Formula: cbind(Deprived, Not.deprived) ~ (1 | Country) + (1 | ClusterID) + QoG + GDPLog + Rural * KilledPerMillion5Log 
Data: my.small.df

and fitted the model to an aggregated version of the original data
set. However, while so doing I thought: "this will be exactly like the
model that glmer could not fit without warnings, I'll have exactly the
same warnings again".

In a sense it is the same model, the beta-coefficients are exactly the
same, but in another sense it apparently is not, the warnings are gone
:-)

I guess the difference is that glmer does not have to care about
residuals at the individual level anymore.

I now understand this model as a kind of repeated measures model,
where each cluster is measured repeatedly, once for each individual in
the cluster. While that tecnically does not describe how the data was
generated, it is a clever shortcut to get what I need. Thanks again!
#
On Sun, Apr 05, 2015 at 03:53:29PM +0200, Hans Ekbrand wrote:
Not exactly the same, tough.

KilledPerMillion5Log          -0.62841    0.69872   -0.90   0.3685    
Ruralyes:KilledPerMillion5Log -0.73146    0.07823   -9.35  < 2e-16 ***

Compared to 

                                     NM     bobyqa      nlmin       BFGS
(Intercept)                   4.9857379  3.7283744  4.9477121  3.2138480
QoG                           0.7866227  0.5962816  0.7534208  0.5991817
GDPLog                       -1.5161825 -1.3643422 -1.5111097 -1.2940261
Ruralyes                      4.3436228  4.3422641  4.3419199  4.3415551
KilledPerMillion5Log          0.6632158  0.6005677  0.6276264  0.5216984
Ruralyes:KilledPerMillion5Log 0.7313136  0.7316543  0.7314746  0.7329137    

But pretty damn close to nlmin, which you specifically prefered :-)
1 day later
#
Yes, the model and so the likelihood is exactly the same, just that there
is a lot less effort in calculating it for the grouped data. Hopefully this
results in less numerical problems.
On 5 April 2015 at 23:53, Hans Ekbrand <hans.ekbrand at gmail.com> wrote: