Skip to content

c++ exception with logistic glmer

7 messages · David Duffy, Cole, Tim, Emmanuel Curis +1 more

#
I'm returning to a thread I started a year ago. My logistic glmer model (described below) generated an obscure error message, which Emmanuel Curis pointed out arose from duplicate rows in my data. Ben Bolker and David Duffy thought there might be separation in the cleaned dataset, and Jonathan French and Ben suggested an alternative time-to-event analysis instead of the logistic.

The dataset consists of longitudinal measures of a binary bone maturity score called mat, which for each subject (n = 607) consists of zero or more 0s followed by zero or more 1s, the age at 0/1 transition constituting the subject's age at maturity. The aim is to estimate median age at maturity, given by ?intercept/age coefficient and confidence interval based on Fieller's theorem.

The full data frame is bhs (4565 rows), while dfs is a reduced data frame (1086 rows) consisting of the last mat 0 entry and first mat 1 entry per subject (where available).

Using glm ignores the longitudinal element, whille glmer allows a random subject intercept reflecting inter-subject variability in age at maturity.

lm0 <- glm(mat ~ age, family=binomial, data=bhs)
lm0r <- glm(mat ~ age, family=binomial, data=dfs)
for (n in c(0, 1, 5, 9)) {
assign(paste('lm0', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=bhs, nAGQ=n))
assign(paste('lm0r', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=dfs, nAGQ=n))
}
list <- ls(pattern='^lm0')
unlist(list, function(z) {
res <- BIC(get(z))
names(res) <- z
res
}))
             lm0    lm0.0    lm0.1    lm0.5    lm0.9     lm0r   lm0r.0   lm0r.1   lm0r.5   lm0r.9
2474.576 1812.824 1386.288 1788.585 1778.761 1298.951 1305.941 1305.941 1305.941 1305.941

With the full data frame (models lm0.n) BIC varies with nAGQ, and for the Laplace fit (nAGQ=1) the model looks dodgy. The coefficients are as follows:

$lm0
              Estimate Std. Error   z value      Pr(>|z|)
(Intercept) -18.700922  0.5783898 -32.33273 2.426656e-229
age           1.169465  0.0363381  32.18288 3.063925e-227

$lm0.0
              Estimate Std. Error   z value      Pr(>|z|)
(Intercept) -75.261703  3.3652807 -22.36417 8.790724e-111
age           4.710325  0.2086032  22.58031 6.766768e-113

$lm0.1
              Estimate Std. Error   z value Pr(>|z|)
(Intercept) -276.41591 1.21959198 -226.6462        0
age           17.35977 0.08850143  196.1525        0

$lm0.5
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -71.992747  6.7038887 -10.73895 6.679749e-27
age           4.497433  0.4186714  10.74215 6.452210e-27

$lm0.9
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -80.623844  8.9423308 -9.015976 1.951243e-19
age           5.035102  0.5569928  9.039797 1.569631e-19

As expected the regression line is steeper with the random effect included, though again lm0.1 look odd.

With the reduced data frame the lmr.n models do not vary with nAGQ. But the odder thing is that the coefficients are identical with and without the random effect, as the SD of the random effect is estimated to be zero.

$lm0r
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -8.6820194 0.69870600 -12.42586 1.891928e-35
age          0.5393926 0.04395768  12.27072 1.300837e-34

$lm0r.0
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -8.6820193 0.69864489 -12.42694 1.866391e-35
age          0.5393926 0.04395395  12.27177 1.284216e-34

So I have two questions. Why do the results with the full data frame depend on nAGQ, and why with the reduced data frame is the SD of random effect estimated as zero?

Thanks for your thoughts.

Tim
 ---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
Population, Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK


Date: Mon, 06 Jan 2014 08:43:21 -0500
From: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>
To: "Cole, Tim" <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>,
"r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] c++ exception with logistic glmer
Message-ID: <52CAB2F9.50306 at gmail.com<mailto:52CAB2F9.50306 at gmail.com>>
Content-Type: text/plain; charset=ISO-8859-1
On 14-01-06 04:04 AM, Cole, Tim wrote:
The dataset consists of longitudinal measures of bone score during
puberty, where 1000 indicates maturity. The aim is to estimate median
age at maturity in four groups by sex and ethnicity.
This glm code works fine, but ignores the longitudinal element. lm2
<- glm(I(RUSBoneScore == 1000) ~ log(DecAge) + Sex * Ethnicity,
family=binomial, data=na.omit(bh[, 3:6]))
This code, which adds a random subject effect, fails with the
unhelpful error message below. lm3 <- glmer(I(RUSBoneScore == 1000) ~
log(DecAge) + Sex * Ethnicity + (1 | BHID), family=binomial,
data=na.omit(bh[, 2:6]))
Error in pwrssUpdate(pp, resp, tolPwrss, GHrule(0L), compDev,
verbose) : c++ exception (unknown reason)
I've tried various alternatives but they all fail in the same way.
Thoughts please.

    Hard to say without a reproducible example. My main comment is that
this is *not* a problem I have seen anyone report before.  sessionInfo()
please?  Can you try with verbose=100? Can you make a reproducible
example that is a reasonable size and doesn't have confidentiality problems?

  Ben Bolker

Thanks, Tim Cole -- Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ucl.ac.uk><mailto:Tim.Cole at ich.ucl.ac.uk>
Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381 Centre for Paediatric
Epidemiology and Biostatistics UCL Institute of Child Health, London
WC1N 1EH, UK




------------------------------

Message: 2
Date: Tue, 7 Jan 2014 02:08:06 +0000 (UTC)
From: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>
To: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Fit negative binomial glmm credible intervals
from the predict function?
Message-ID: <loom.20140107T030759-892 at post.gmane.org<mailto:loom.20140107T030759-892 at post.gmane.org>>
Content-Type: text/plain; charset=us-ascii

Sheryn Olson <sheryn.olson at ...> writes:

Hello Ben and all,
Thank you Ben, very much for your previous reply.  Yes, stand/plot makes
sense.  After talking with a few statisticians, I got convergence!   The
repeated measure issue is solvable with a dummy variable for each of the 3
years (like a fiscal or school year) - so yr1 = 2010smr+2011wtr, and so
on.  Pellet plots were sampled 6 times, 3 each season, vegetation only
once.  Among year variability can be high.
I'm now attempting to make predictive plots from models.  I used UCLA's
method posted at their IRDA site that has a negative binomial example:
             http://www.ats.ucla.edu/stat/r/dae/nbreg.htm
            newdata2 <- cbind(newdata2, predict(m1, newdata2,type =
"link", se.fit=TRUE))
            newdata2 <- within(newdata2, {
            DaysAbsent <- exp(fit)
                  LL <- exp(fit - 1.96 * se.fit)
                  UL <- exp(fit + 1.96 * se.fit)
              })
but I have an uneasy feeling about fitting standard errors from a
gaussian distribution.

  In case it helps, what you're assuming is Normal is the *sampling
distribution of the parameters*, not the data themselves ...

  The bottom line is that what you're doing seems reasonable, with
the caveats already expressed in ?predict.glmmadmb ...

In the past I've used MCMCglmm to estimate credible intervals around the
beta coefficients of a categorical variable, but now,
I have a continuous variable, conifer sapling count.

  Well, technically it's a discrete (count) variable, not continuous ...

  Section 2 of the glmmADMB vignette [vignette("glmmADMB")] has
a bit of information about running post-fit MCMC.   There should
be an example of how to generate confidence intervals (basically,
you would generate predictions for each sampled set of coefficients
and then compute the quantiles or credible intervals of the
predictions), but there isn't (yet ...)

So, lots of questions!
How does one make an accurate predictive graph from a nbinom model?

  Good question.  Not easy, you may need to take some shortcuts
and hope for the best.

Does it make sense to set up a dataframe with the means of the
covariates when the covariates'
distributions were skewed count data?

  It depends what values you want to predict for.  The data frame
typically includes "typical" values of the parameters; you're welcome
to compute predictions for the medians instead if you prefer, or
for a range of values.

How do I use MCMCglmm correctly?

  ??  Do you mean the 'mcmc' argument of glmmadmb?

Is "se.fit" valid for the nbinom log link, or....?

  [Update: I just wrote a bunch of stuff about how se.fit doesn't
work for glmmadmb, when it indeed does! However, I had already
written all the stuff below, which describes more generally how
you would do it by hand, so I'm just going to send it anyway]

  The basic recipe for constructing confidence intervals, as laid out
in http://glmm.wikidot.com/faq in the "Predictions and/or confidence
(or prediction) intervals on predictions" section, is

(1) construct the model matrix (X)
  * If you want predictions on the original model (not this case),
the model.matrix() accessor may work to extract it from the fit for you
  * If not, or if you want to construct predictions for new data,
you need to call model.matrix() with the fixed-effect formula (you
may or may not be able to extract the fixed-effect part of the
formula only from the model, but it's usually fairly easy just
to respecify it

  (2) extract the variance-covariance matrix of the fixed effect
predictors, V (usually just vcov(model))

  (3) extract the fixed-effect coefficients (usually just fixef(model)
or coef(model))

then the predictions (on the linear predictor scale, at the population
level [i.e. ignoring all random effects]) are

X %*% beta

(plus an offset if there is one)

and the standard errors of prediction, *CONDITIONAL ON THE
RANDOM EFFECTS* [i.e. ignoring all uncertainty due to uncertainty
of the random effects] are

  sqrt(diag(X %*% V %*% t(X)))

Then to get confidence intervals you should compute

  inverse_link(fit +/- 1.96*se.fit) exactly as you have below

This assumes further that the sampling distribution of the
coefficients is Normal (in which case the linear computation
we did above will also lead to a Normal distribution, on
the linear predictor scale).  This might not be true for
small or badly behaved data sets (of course it is never
exactly true except asymptotically ...), but there's not
much you can do about it without working much harder.

my Code:
########## CONIFER SAPLINGS by pellets/ha/month (phm)  #####################
#### mod1.all <- glmmadmb(pellets ~ season * (t.con.splgs +
t.dec.trees + pctMidCov + t.BAtrees + cc) +
####                     offset(ln.days)+(1|stand/plot)+ (1|hareyr),
data=hv, family="nbinom")
predCS <- data.frame(
                      t.con.splgs = rep(seq(from = min(hv$t.con.splgs),
to = max(hv$t.con.splgs),length.out = 100), 2),
                          t.dec.trees=rep(mean(hv$t.dec.trees),200),
                          pctMidCov=rep(mean(hv$pctMidCov),200),
t.BAtrees=rep(mean(hv$t.BAtrees),200),cc=rep(mean(hv$cc),200),
                          ln.days=rep(log(30.25),200),
                          season = factor(rep(1:2, each = 100), levels
= 1:2, labels =levels(hv$season)))
predCS <- cbind(predCS, predict(mod1.all, predCS, type = "link", se.fit=TRUE))
predCS <- within(predCS, {
   pellets <- exp(fit)
   LL <- exp(fit - 1.96 * se.fit)   # these seem wrong !!
   UL <- exp(fit + 1.96 * se.fit)   # ??
})

  Do you mean you're suspicious of the code, or that the numbers
it produces seem wrong?

head(predCS)           #  check to see fits
   t.con.splgs t.dec.trees pctMidCov t.BAtrees       cc  ln.days season
       fit    se.fit       UL       LL   pellets
1   0.0000000    3.493585  40.80654  4.117168 82.84937 3.409496    smr
-1.051665 0.5187013 6437.304 842.6536 0.3493557
2   0.7142493    3.493585  40.80654  4.117168 82.84937 3.409496    smr
-1.046329 0.5187194 6471.975 847.1320 0.3512248
3   1.4284985    3.493585  40.80654  4.117168 82.84937 3.409496    smr
-1.040993 0.5187633 6507.163 851.5911 0.3531040
4   2.1427478    3.493585  40.80654  4.117168 82.84937 3.409496    smr
-1.035657 0.5188331 6542.872 856.0304 0.3549932
5   2.8569971    3.493585  40.80654  4.117168 82.84937 3.409496    smr
-1.030321 0.5189286 6579.111 860.4492 0.3568926
6   3.5712464    3.493585  40.80654  4.117168 82.84937 3.409496    smr
-1.024984 0.5190500 6615.885 864.8472 0.3588020
   con.splgs.ha      phm
1      0.00000 2329.038
2      5.10152 2341.499
3     20.40608 2354.027
4     45.91368 2366.622
5     81.62432 2379.284
6    127.53801 2392.014
############# pellets/ha/month (phm) scale up
#############  and scale up saplings to per ha from 0.1ha plot level
predCS <- within(predCS, {
   pellets <- exp(fit)
   phm <- pellets/1.5*10000
    LL <- (exp(fit - 1.96 * se.fit))/1.5*10000
    UL <- (exp(fit + 1.96 * se.fit))/1.5*10000
    con.splgs.ha <- 10*(t.con.splgs^2)          # backtransform square
root and scale up
})
CS <- ggplot(predCS, aes(con.splgs.ha, phm)) +
   theme_bw() +   #eliminates background, gridlines, but not border
   theme(
     plot.background = element_blank()
    ,panel.grid.major = element_blank()
    ,panel.grid.minor = element_blank()
#   ,panel.border = element_blank()
    ,panel.background = element_blank()
   ) +
   geom_ribbon(aes(ymin = LL, ymax = UL, fill = season),alpha = .25) +
   #geom_line(aes(colour = season),size = 1) +
   geom_smooth(aes(colour=season, linetype=season),size = 1, se = F) +
   scale_colour_manual(values=c("wtr"= 4, "smr" = 3)) +
   scale_x_continuous(limits = c(0, 25000)) +
   scale_y_continuous(limits = c(0, 12000)) +
   scale_fill_brewer(palette="Accent") +
   theme(legend.position = 'none') +         #get rid of the legend
   labs(x = "Conifer Sapling Count per ha", y = "Predicted Pellets/ha/month")
print(CS)
Thanks for any ideas.
Sheryn




------------------------------

_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


End of R-sig-mixed-models Digest, Vol 85, Issue 6
*************************************************
1 day later
#
On Tue, 6 Jan 2015, Cole, Tim wrote:

            
[...]B
As now described, your problem is clearly interval censored time-to-event, 
and your logistic model is just not the right approach - bone maturity is 
an irreversible state (consider what happens to your age regression 
coefficient if you add in more ages before or after maturity).  Either do 
the survival analysis, which gives you the median age at maturity, or fit 
a (nonlinear) growth model to maturity score.

| David Duffy (MBBS PhD)
| email: David.Duffy at qimrberghofer.edu.au  ph: INT+61+7+3362-0217 fax: -0101
| Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
| 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A
#
David,

I agree that the model can be fitted as an interval-censored time-to-event (actually it's a mix of left, right and interval-censored), but that does not make my approach wrong. In fact it is better in two respects.

First, bone maturity score is subject to measurement error, so it's possible (though rare) to transition from 0 to 1 and then back to 0. Survival analysis treats such points as missing, when they are a valid representation of measurement error.

Second, I am interested in the between-subject SD of the time to maturity, obtained by dividing the random intercept SD by the age coefficient. To my knowledge this is not available from the survival analysis.

Whether or not you are persuaded by these arguments, the fact remains that my two questions about glmer remain valid - its behavour is odd.

Best wishes,
Tim
---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
Population Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK

From: David Duffy <David.Duffy at qimr.edu.au<mailto:David.Duffy at qimr.edu.au>>
Date: Thursday, 8 January 2015 05:32
To: Tim Cole <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>
Cc: "r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] c++ exception with logistic glmer
On Tue, 6 Jan 2015, Cole, Tim wrote:
I'm returning to a thread I started a year ago. My logistic glmer model
[...]B
and Jonathan French and Ben suggested an alternative time-to-event
analysis instead of the logistic.

As now described, your problem is clearly interval censored time-to-event,
and your logistic model is just not the right approach - bone maturity is
an irreversible state (consider what happens to your age regression
coefficient if you add in more ages before or after maturity).  Either do
the survival analysis, which gives you the median age at maturity, or fit
a (nonlinear) growth model to maturity score.

| David Duffy (MBBS PhD)
| email: David.Duffy at qimrberghofer.edu.au<mailto:David.Duffy at qimrberghofer.edu.au>  ph: INT+61+7+3362-0217 fax: -0101
| Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
| 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A
#
Tim,

Did you looked at the msm package which, IIRC, allows to implement a
hidden markov model with hidden states the real state (with only 0 ->
1 transition allowed) and allowed states the observed states (which
can then have 1 -> 0 sequences), with transition matrix probabilities
and measurement errors matrixes? This should handle the measurement
errors problems you mention, also handling various censoring.

However, I am not sure it will easily allow to estimate a
between-subject SD of the time to maturity, you should check the
package doc and the underlying model to see if one can access each
individual transition time to have your answer... I am unfortunately
not familiar enough with it to confirm or not that.
On Thu, Jan 08, 2015 at 11:33:42AM +0000, Cole, Tim wrote:
? David,
? 
? I agree that the model can be fitted as an interval-censored time-to-event (actually it's a mix of left, right and interval-censored), but that does not make my approach wrong. In fact it is better in two respects.
? 
? First, bone maturity score is subject to measurement error, so it's possible (though rare) to transition from 0 to 1 and then back to 0. Survival analysis treats such points as missing, when they are a valid representation of measurement error.
? 
? Second, I am interested in the between-subject SD of the time to maturity, obtained by dividing the random intercept SD by the age coefficient. To my knowledge this is not available from the survival analysis.
? 
? Whether or not you are persuaded by these arguments, the fact remains that my two questions about glmer remain valid - its behavour is odd.
? 
? Best wishes,
? Tim
? ---
? Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
? Population Policy and Practice Programme
? UCL Institute of Child Health, London WC1N 1EH, UK
? 
? From: David Duffy <David.Duffy at qimr.edu.au<mailto:David.Duffy at qimr.edu.au>>
? Date: Thursday, 8 January 2015 05:32
? To: Tim Cole <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>
? Cc: "r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
? Subject: Re: [R-sig-ME] c++ exception with logistic glmer
?
? On Tue, 6 Jan 2015, Cole, Tim wrote:
? 
? I'm returning to a thread I started a year ago. My logistic glmer model
? [...]B
? and Jonathan French and Ben suggested an alternative time-to-event
? analysis instead of the logistic.
? 
? As now described, your problem is clearly interval censored time-to-event,
? and your logistic model is just not the right approach - bone maturity is
? an irreversible state (consider what happens to your age regression
? coefficient if you add in more ages before or after maturity).  Either do
? the survival analysis, which gives you the median age at maturity, or fit
? a (nonlinear) growth model to maturity score.
? 
? | David Duffy (MBBS PhD)
? | email: David.Duffy at qimrberghofer.edu.au<mailto:David.Duffy at qimrberghofer.edu.au>  ph: INT+61+7+3362-0217 fax: -0101
? | Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
? | 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A
? 
? 
? 	[[alternative HTML version deleted]]
? 
? _______________________________________________
? R-sig-mixed-models at r-project.org mailing list
? https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
Thanks Emmanuel. I haven't come across msm.

In my previous post I forgot to respond to David's comment about growth curve modelling. As it happens I have already analysed and published the data that way, using my SITAR model which includes a subject-specific random effect on the age scale. This is exactly analogous to the random effect I am trying to fit with glmer, and hence is another reason why I prefer that approach.

Best wishes,
Tim
---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
Population Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK

From: Emmanuel Curis <emmanuel.curis at parisdescartes.fr<mailto:emmanuel.curis at parisdescartes.fr>>
Date: Thursday, 8 January 2015 11:46
To: Tim Cole <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>
Cc: David Duffy <David.Duffy at qimr.edu.au<mailto:David.Duffy at qimr.edu.au>>, "r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] c++ exception with logistic glmer

Tim,

Did you looked at the msm package which, IIRC, allows to implement a
hidden markov model with hidden states the real state (with only 0 ->
1 transition allowed) and allowed states the observed states (which
can then have 1 -> 0 sequences), with transition matrix probabilities
and measurement errors matrixes? This should handle the measurement
errors problems you mention, also handling various censoring.

However, I am not sure it will easily allow to estimate a
between-subject SD of the time to maturity, you should check the
package doc and the underlying model to see if one can access each
individual transition time to have your answer... I am unfortunately
not familiar enough with it to confirm or not that.
On Thu, Jan 08, 2015 at 11:33:42AM +0000, Cole, Tim wrote:
? David,
?
? I agree that the model can be fitted as an interval-censored time-to-event (actually it's a mix of left, right and interval-censored), but that does not make my approach wrong. In fact it is better in two respects.
?
? First, bone maturity score is subject to measurement error, so it's possible (though rare) to transition from 0 to 1 and then back to 0. Survival analysis treats such points as missing, when they are a valid representation of measurement error.
?
? Second, I am interested in the between-subject SD of the time to maturity, obtained by dividing the random intercept SD by the age coefficient. To my knowledge this is not available from the survival analysis.
?
? Whether or not you are persuaded by these arguments, the fact remains that my two questions about glmer remain valid - its behavour is odd.
?
? Best wishes,
? Tim
? ---
? Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ucl.ac.uk><mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
? Population Policy and Practice Programme
? UCL Institute of Child Health, London WC1N 1EH, UK
?
? From: David Duffy <David.Duffy at qimr.edu.au<mailto:David.Duffy at qimr.edu.au><mailto:David.Duffy at qimr.edu.au>>
? Date: Thursday, 8 January 2015 05:32
? To: Tim Cole <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk><mailto:tim.cole at ucl.ac.uk>>
? Cc: "r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org><mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org><mailto:r-sig-mixed-models at r-project.org>>
? Subject: Re: [R-sig-ME] c++ exception with logistic glmer
?
? On Tue, 6 Jan 2015, Cole, Tim wrote:
?
? I'm returning to a thread I started a year ago. My logistic glmer model
? [...]B
? and Jonathan French and Ben suggested an alternative time-to-event
? analysis instead of the logistic.
?
? As now described, your problem is clearly interval censored time-to-event,
? and your logistic model is just not the right approach - bone maturity is
? an irreversible state (consider what happens to your age regression
? coefficient if you add in more ages before or after maturity).  Either do
? the survival analysis, which gives you the median age at maturity, or fit
? a (nonlinear) growth model to maturity score.
?
? | David Duffy (MBBS PhD)
? | email: David.Duffy at qimrberghofer.edu.au<mailto:David.Duffy at qimrberghofer.edu.au><mailto:David.Duffy at qimrberghofer.edu.au>  ph: INT+61+7+3362-0217 fax: -0101
? | Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
? | 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A
?
?
? [[alternative HTML version deleted]]
?
? _______________________________________________
? R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
? https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr<mailto:emmanuel.curis at parisdescartes.fr>

Page WWW: http://emmanuel.curis.online.fr/index.html
#
The different log likelihoods for different values of nAGQ, and hence BIC,
is a known problem with glmer. nAGQ=1 is correct, and I think other values
are just out by a constant.

As to your model, what is happening is that a step function is being
modeled by a logistic. As a result the coefficient of age tends towards a
large number (a parameter estimate of 5 corresponds to an odds ratio of
about 150 per year, 17 is a huge odds ratio) , in an attempt  to reproduce
the step. The random effect will be modeling where the switch happens so
will also be large, as it needs to move a very steep line to varying values
of age. Effectively the model is close to being non-identifiable, as the
loglikelihood will remain almost unchanged for increasing combinations of
age parameter and random effect variance. Centering age might improve the
numerics and result in even larger coefficients.

I'm not certain what is happening with the 2 observations per subject. If
they were all 0,1 then the coefficient of age would be infinite, as a one
year change in age would always produce a change from zero to one.
However, it is similar to the discrete logistic survival which has a series
of zeroes followed by a 1 and doesn't require a random effect to model.

Ken
On 6 January 2015 at 23:34, Cole, Tim <tim.cole at ucl.ac.uk> wrote:

            

  
    
5 days later
#
Thanks Ken for your insights, and also for David Duffy?s thoughts (sent offline):

The reduced model has an undefined intraclass correlation for the binary
maturity variable (every cluster is 0-1), so this is why the variance is
zero.  It seems to me, as I mentioned earlier, that if you add additional
time points on either side of the transition to maturity the slope
coefficient for each individual will (arbitrarily) flatten out, which
implies the random intercept model isn't doing what you want it to.
Certainly, the empirical within-individual correlation between
observations is not uniform, which is the simplest RE model.

The message is clear, that this model is not going to work, which is a shame. I would point out though that all the models give reasonable estimates of the median age at maturity, -intercept/slope, which is one of the summary statistics I?m interested in.

Tim
---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
Population Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK

From: Ken Beath <ken.beath at mq.edu.au<mailto:ken.beath at mq.edu.au>>
Date: Thursday, 8 January 2015 22:08
To: Tim Cole <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>
Cc: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>, "r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] c++ exception with logistic glmer

The different log likelihoods for different values of nAGQ, and hence BIC, is a known problem with glmer. nAGQ=1 is correct, and I think other values are just out by a constant.

As to your model, what is happening is that a step function is being modeled by a logistic. As a result the coefficient of age tends towards a large number (a parameter estimate of 5 corresponds to an odds ratio of about 150 per year, 17 is a huge odds ratio) , in an attempt  to reproduce the step. The random effect will be modeling where the switch happens so will also be large, as it needs to move a very steep line to varying values of age. Effectively the model is close to being non-identifiable, as the loglikelihood will remain almost unchanged for increasing combinations of age parameter and random effect variance. Centering age might improve the numerics and result in even larger coefficients.

I'm not certain what is happening with the 2 observations per subject. If they were all 0,1 then the coefficient of age would be infinite, as a one year change in age would always produce a change from zero to one.   However, it is similar to the discrete logistic survival which has a series of zeroes followed by a 1 and doesn't require a random effect to model.

Ken
On 6 January 2015 at 23:34, Cole, Tim <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>> wrote:
I'm returning to a thread I started a year ago. My logistic glmer model (described below) generated an obscure error message, which Emmanuel Curis pointed out arose from duplicate rows in my data. Ben Bolker and David Duffy thought there might be separation in the cleaned dataset, and Jonathan French and Ben suggested an alternative time-to-event analysis instead of the logistic.

The dataset consists of longitudinal measures of a binary bone maturity score called mat, which for each subject (n = 607) consists of zero or more 0s followed by zero or more 1s, the age at 0/1 transition constituting the subject's age at maturity. The aim is to estimate median age at maturity, given by ?intercept/age coefficient and confidence interval based on Fieller's theorem.

The full data frame is bhs (4565 rows), while dfs is a reduced data frame (1086 rows) consisting of the last mat 0 entry and first mat 1 entry per subject (where available).

Using glm ignores the longitudinal element, whille glmer allows a random subject intercept reflecting inter-subject variability in age at maturity.

lm0 <- glm(mat ~ age, family=binomial, data=bhs)
lm0r <- glm(mat ~ age, family=binomial, data=dfs)
for (n in c(0, 1, 5, 9)) {
assign(paste('lm0', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=bhs, nAGQ=n))
assign(paste('lm0r', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=dfs, nAGQ=n))
}
list <- ls(pattern='^lm0')
unlist(list, function(z) {
res <- BIC(get(z))
names(res) <- z
res
}))
             lm0    lm0.0    lm0.1    lm0.5    lm0.9     lm0r   lm0r.0   lm0r.1   lm0r.5   lm0r.9
2474.576 1812.824 1386.288 1788.585 1778.761 1298.951 1305.941 1305.941 1305.941 1305.941

With the full data frame (models lm0.n) BIC varies with nAGQ, and for the Laplace fit (nAGQ=1) the model looks dodgy. The coefficients are as follows:

$lm0
              Estimate Std. Error   z value      Pr(>|z|)
(Intercept) -18.700922  0.5783898 -32.33273 2.426656e-229
age           1.169465  0.0363381  32.18288 3.063925e-227

$lm0.0
              Estimate Std. Error   z value      Pr(>|z|)
(Intercept) -75.261703  3.3652807 -22.36417 8.790724e-111
age           4.710325  0.2086032  22.58031 6.766768e-113

$lm0.1
              Estimate Std. Error   z value Pr(>|z|)
(Intercept) -276.41591 1.21959198 -226.6462        0
age           17.35977 0.08850143  196.1525        0

$lm0.5
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -71.992747  6.7038887 -10.73895 6.679749e-27
age           4.497433  0.4186714  10.74215 6.452210e-27

$lm0.9
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -80.623844  8.9423308 -9.015976 1.951243e-19
age           5.035102  0.5569928  9.039797 1.569631e-19

As expected the regression line is steeper with the random effect included, though again lm0.1 look odd.

With the reduced data frame the lmr.n models do not vary with nAGQ. But the odder thing is that the coefficients are identical with and without the random effect, as the SD of the random effect is estimated to be zero.

$lm0r
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -8.6820194 0.69870600 -12.42586 1.891928e-35
age          0.5393926 0.04395768  12.27072 1.300837e-34

$lm0r.0
              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -8.6820193 0.69864489 -12.42694 1.866391e-35
age          0.5393926 0.04395395  12.27177 1.284216e-34

So I have two questions. Why do the results with the full data frame depend on nAGQ, and why with the reduced data frame is the SD of random effect estimated as zero?

Thanks for your thoughts.

Tim
 ---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ucl.ac.uk><mailto:Tim.Cole at ich.ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk>> Phone +44(0)20 7905 2666<tel:%2B44%280%2920%207905%202666> Fax +44(0)20 7905 2381<tel:%2B44%280%2920%207905%202381>
Population, Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK






--

Ken Beath
Lecturer
Statistics Department
MACQUARIE UNIVERSITY NSW 2109, Australia

Phone: +61 (0)2 9850 8516

Building E4A, room 526
http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/

CRICOS Provider No 00002J
This message is intended for the addressee named and may...{{dropped:10}}