Little variability in outcome; "pwrssUpdate did not converge" - R-SIG-mixed-models

Mon, Mar 23, 2015 5:53 AM #

Dear list,

I have a dichotomous outcome (child mortality) with a very high mean
(0.9946) in a large dataset (3.5m).
The "Error: pwrssUpdate did not converge in (maxit) iterations" occurs in
most cases. I've tried using blme to combat complete separation with
fixef.priors with SDs from 1 to 10 without success. The variance explained
by the random family effect is numerically very small (0.000698) though I
suppose that still amounts to ca 7%. There's few members per family (~ 2 on
average). Fitting a glm without the family intercepts results in fairly
different results (which I expect), judging by the few models that ran.
Using less data sometimes leads to convergence, depending on the sample I
draw, I suppose. I'm using bobyqa.

I thought maybe the problem still is complete separation and I'm just being
too timid with the blme prior.

Oddly (maybe not), the only model where I do get convergence is one where I
accidentally mis-specified my sample, so my outcome was censored (hence the
mean but not the intercept was lower). I'm attaching the model.

Best regards,

Ruben Arslan

## Cov prior : idParents ~ wishart(df = 3.5, scale = Inf, posterior.scale =
cov, common.scale = TRUE) ## Fixef prior: normal(sd = c(9, 9, ...), corr =
c(0 ...), common.scale = FALSE) ## Prior dev : 143 ## ## Generalized linear
mixed model fit by maximum likelihood (Laplace ## Approximation)
[bglmerMod] ## Family: binomial ( logit ) ## Formula: surviveR ~
maternalage.factor + paternalloss + maternalloss + ## center(nr.siblings) +
birth.cohort + male + paternalage.mean + ## paternalage.factor + (1 |
idParents) ## Data: swed.2 ## Control: control_defaults ## Subset:
survive1y == TRUE & byear < 2000 ## ## AIC BIC logLik deviance df.resid ##
938507 938795 -469231 938463 3691460 ## ## Scaled residuals: ## Min 1Q
Median 3Q Max ## -134.50 0.04 0.05 0.06 3.07 ## ## Random effects: ##
Groups Name Variance Std.Dev. ## idParents (Intercept) 0.000698 0.0264 ##
Number of obs: 3691482, groups: idParents, 1907489 ## ## Fixed effects: ##
Estimate Std. Error z value Pr(>|z|) ## (Intercept) 6.84279 0.03471 197.2 <
2e-16 *** ## maternalage.factor(14,20] 0.16356 0.01600 10.2 < 2e-16 *** ##
maternalage.factor(35,61] -0.18822 0.00871 -21.6 < 2e-16 *** ##
paternallossTRUE -0.41957 0.04694 -8.9 < 2e-16 *** ## paternallossNA
-0.30693 0.01819 -16.9 < 2e-16 *** ## maternallossTRUE -0.67635 0.08228
-8.2 < 2e-16 *** ## maternallossNA -0.11658 0.02607 -4.5 7.8e-06 *** ##
center(nr.siblings) 0.27749 0.00288 96.2 < 2e-16 *** ##
birth.cohort(1970,1977] 0.35761 0.02833 12.6 < 2e-16 *** ##
birth.cohort(1977,1984] 0.72394 0.03203 22.6 < 2e-16 *** ##
birth.cohort(1984,1991] 0.86295 0.03173 27.2 < 2e-16 *** ##
birth.cohort(1991,1999] -5.95342 0.01933 -308.0 < 2e-16 *** ## male
-0.01946 0.00512 -3.8 0.00015 *** ## paternalage.mean 0.88269 0.01168 75.5
< 2e-16 *** ## paternalage.factor(25,30] -0.53984 0.01068 -50.5 < 2e-16 ***
## paternalage.factor(30,35] -1.18842 0.01360 -87.4 < 2e-16 *** ##
paternalage.factor(35,40] -1.59243 0.01815 -87.7 < 2e-16 *** ##
paternalage.factor(40,45] -2.02418 0.02429 -83.3 < 2e-16 *** ##
paternalage.factor(45,50] -2.46269 0.03266 -75.4 < 2e-16 *** ##
paternalage.factor(50,55] -3.11201 0.04679 -66.5 < 2e-16 *** ##
paternalage.factor(55,90] -3.67437 0.06747 -54.5 < 2e-16 ***

## R version 3.1.0 (2014-04-10) ## Platform: x86_64-redhat-linux-gnu
(64-bit) ## ## other attached packages: ## [1] mgcv_1.8-4 nlme_3.1-119
stringr_0.6.2 pander_0.5.1 ## [5] blme_1.0-2 formr_0.1.11 lme4_1.1-7
Rcpp_0.11.4 ## [9] Matrix_1.1-5 ggplot2_1.0.0 data.table_1.9.5 knitr_1.9

David Duffy

Tue, Mar 24, 2015 4:01 PM #

On Mon, 23 Mar 2015, Ruben Arslan wrote:

The misspecified model? Maybe you should be doing something else, such as 
bivariate logistic (dropping extra offspring) or marginal models? If you 
are interested just in familial aggregation, you can do the conditional 
analysis using just the ~18000 odd families with one or more events, 
using the other families just to estimate offsets.

A few random thoughts ;)

| David Duffy (MBBS PhD)
| email: David.Duffy at qimrberghofer.edu.au  ph: INT+61+7+3362-0217 fax: -0101
| Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
| 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A

Ruben Arslan

Tue, Mar 24, 2015 5:06 PM #

Thanks for your response! I'd prefer to model this the same way I did in three other populations (with lower means and sample sizes) for the sake of presentation and comparability. The basic idea (sorry that wasn't clear) is a sibling control design, examining the effect of paternal age within families (i.e. no marginal models for me).

I'm not sure I understand how I could estimate offsets separately from the conditional analysis. I've tried including only families with at least two sibs (nope), but wouldn't selecting based on the outcome introduce bias? How would I remedy that?

My previous mail contained a mis-specified model, since that happened to give any output and I thought it might be informative. 
It also had a odd prior specification. The default specification is c(10,2.5). Unthinkingly, I set a very high SD on the slopes i.e. c(9,9). That's not a good idea since these high SDs on the normal put a lot of weight on 0 and 1 on the logit (there's a section on this in 2.6. of the MCMCglmm course notes).
Unfortunately, even though I do get improved results with small subsamples (30k) using the default prior spec (as opposed to vanilla glmer), the models still do not converge with the 3.5m dataset. 

I was thinking that I might get closer by simply splitting my sample? I'm of course still hoping there's some control I've missed.