Skip to content

rare binary outcome, MCMCglmm, and priors (related to separation)

6 messages · David Atkins, David Duffy, Dennis Murphy +1 more

#
Some colleagues have collected data from 184 females in dating 
relationships.  Data were collected daily using PDAs; the outcome is a 
binary indicator of whether any physical aggression occurred (intimate 
partner violence, or IPV).

They are interested in 3 covariates:

-- alcohol use: yes/no
-- anger: rated on 1-5 scale
-- verbal aggression: sum of handful of items, with 0-15 scale

Their hypothesis is that the interaction of all 3 covariates will lead 
to the highest likelihood of IPV.  As you might expect, the outcome is 
very rare with 51 instances of IPV out of 8,269 days of data, and 158 
women (out of 184) reported no instances of IPV.

Question 1: Given that a GLMM will assume a normal distribution for the 
person-specific baserate in IPV, is this data even appropriate for GLMM 
or should they be looking elsewhere (perhaps GEE)?

That said, for some (unknown) proportion of individuals, there probably 
would be instances of IPV if the data collection period were longer. 
Thus, perhaps there is some basis for assuming a distribution across 
people, even if the observed data for some individuals are all zeroes.

To present some of the data (and I can check to see if it would be okay 
to make the data available), I dichotomized both anger and verbal 
aggression ("prov.cut" below):

   ang.cut prov.cut alc.cut ipv.yes ipv.no
1       0        0       0       0     3918
2       0        0       1       0        1
3       1        0       0       5     2381
4       1        0       1       1      292
5       1        1       0      36     1471
6       1        1       1       9      257

Thus, the instances of IPV are more likely when there is anger and 
verbal aggression; alcohol is a little less clear.  (And, if the 
association of anger and verbal aggression with IPV seems tautological, 
there has been debate about different forms of IPV, where some research 
has pointed to "cold" aggression.)

Not surprisingly, analyses using either glmer() or MCMCglmm() show signs 
of partial separation, with some whopping odds-ratios and 95% CI 
spanning a couple orders of magnitude.

I have read a bit about the problems of separation in logistic 
regression and know that Gelman et al suggest Bayesian priors as one 
"solution".  Moreover, I see in Jarrod Hadfield's course notes that his 
multinomial example has a "structural" zero that he addresses via priors 
on pp. 96-97, though I confess I don't quite follow exactly what he has 
done (and why).

If I just let MCMCglmm cook on a regression with all 2-way interactions 
for a long while:

prior = list(R = list(V = 1, fix = 1),
			B = list(mu = c(rep(0,7)), V = diag(7)),
			G = list(G1 = list(V = 1, nu = 0.002)))
lr.mcmc <- MCMCglmm(ipv ~ (alc.cut + angc + log(provc + 0.03))^2, data = 
ipv.df,
					family = "categorical", verbose = TRUE,
					prior = prior,
					nitt = 2000000, burnin = 1000000, thin = 1000,
					random =  ~ person)

The answers are less extreme than what I get with glmer, perhaps 
suggesting this is wandering toward the "correct" solution, though there 
are also plenty of indicators that we aren't there yet:

 > summary(lr.mcmc)

  Iterations = 1999001
  Thinning interval  = 1000001
  Sample size  = 1000

  DIC: 379.972

  G-structure:  ~person

        post.mean l-95% CI u-95% CI eff.samp
person     2.287   0.6775    4.206    194.0

  R-structure:  ~units

       post.mean l-95% CI u-95% CI eff.samp
units         1        1        1        0

  Location effects: ipv ~ (alc.cut + angc + log(provc + 0.03))^2

                           post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)                -2.58724 -3.49492 -1.65250   245.20 <0.001 ***
alc.cut                     0.49512 -0.75268  1.99397  1000.00  0.464
angc                        0.02664 -0.34227  0.41365   283.90  0.880
log(provc + 0.03)           1.36626  0.98743  1.70863    28.69 <0.001 ***
alc.cut:angc               -0.16519 -0.79299  0.41949   683.56  0.590
alc.cut:log(provc + 0.03)  -0.02631 -0.53118  0.55065   157.34  0.898
angc:log(provc + 0.03)     -0.26132 -0.40141 -0.10407    74.98  0.004 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

I haven't run multiple chains yet, but the effective sample sizes and 
trace plots already suggest we ain't there yet.  My specific question is 
whether there would be an alternative prior specification for the 
fixed-effects that would be more appropriate?

I would appreciate any and all thoughts here, including if this just 
doesn't seem like an appropriate data/question for GLMMs.

sessionInfo below.

cheers, Dave

 > sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
  [1] MCMCglmm_2.06      corpcor_1.5.7      ape_2.5-3
  [4] coda_0.13-5        Matrix_0.999375-44 lattice_0.19-10
  [7] tensorA_0.35       Hmisc_3.8-2        modeltools_0.2-16
[10] mvtnorm_0.9-92     survival_2.36-1

loaded via a namespace (and not attached):
  [1] cluster_1.13.1   coin_1.0-16      colorspace_1.0-1 gee_4.13-15
  [5] grid_2.11.1      lme4_0.999375-35 nlme_3.1-96      party_0.9-9998
  [9] rpart_3.1-46     tools_2.11.1
#
On Mon, 30 Aug 2010, David Atkins wrote:

            
Hi. why are you using a mixed model here: dispersion, or are there 
multiple reports per individual?  Another approach for separated/sparse 
data implemented in R is the penalized likelihood approach in the brlr, 
logistf, brglm (and Design) packages:

brglm(formula = cbind(ipv.yes, ipv.no) ~ (ang.cut + prov.cut +
     alc.cut)^2, family = binomial(), data = ipv)

Coefficients: (1 not defined because of singularities)
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)       -8.9666     1.4145  -6.339 2.31e-10 ***
ang.cut            2.8959     1.4775   1.960  0.05000 .
prov.cut           2.3740     0.4587   5.175 2.27e-07 ***
alc.cut            7.8680     2.7082   2.905  0.00367 **
ang.cut:prov.cut       NA         NA      NA       NA
ang.cut:alc.cut   -7.0703     2.8616  -2.471  0.01348 *
prov.cut:alc.cut  -0.4007     0.9962  -0.402  0.68747

Model 1: cbind(ipv.yes, ipv.no) ~ (ang.cut + prov.cut + alc.cut)
Model 2: cbind(ipv.yes, ipv.no) ~ (ang.cut + prov.cut + alc.cut)^2
   Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1         2     1.0875
2         0     1.8387  2 -0.75117

Cheers, David Duffy.
#
On 8/30/10 2:58 PM, David Duffy wrote:
[snip]
Ack; seems like there is always something that I miss in a post - apologies!

Yes, this was a daily diary study with data collected over 60 days per 
individual (with some variability in compliance).  Thus, this is why I 
was thinking of glmer/MCMCglmm.

cheers, Dave

Another approach for separated/sparse
#
Hi Dave,

With respect to the prior specification for the fixed effects, you may  
want to make the variance larger.  Perhaps something like:

prior$B$V = diag(7)*(3+pi^2/3)

The motivation behind this is to choose a prior for b, for which  
plogis(Xb+Zu+e) would be close to a uniform after marginalising the  
random effects u and e.  pi^2/3 is the variance of the logistic  
distribution (the cdf of which is the inverse logit function) and  3  
is the variance of Zu+e assuming Z is an identity matrix (1 for the  
residual variance + ~2 for the person variance). You can see it is  
pretty close for an intercept.

priorB<-rnorm(1000, 0, sqrt(3+pi^2/3))
priorMB<-1:1000
for(i in 1:1000){
priorMB[i]<-mean(plogis(priorB[i]+rnorm(1000,0,sqrt(3))))
}
hist(priorMB)

This example works for a model with a single intercept, and when  
fitting a categorical predictor I usually remove the intercept (-1) so  
that the distribution is approximately uniform for all levels of the  
predictor. For continuous covariates and interactions it will be a bit  
more involved and you should probably read  Gelman et. al. 2008 Annals  
of Applied Statistics 1360-1383.

Using a prior with a variance of one will shrink the estimates to less  
extreme values and may explain some of the differences between models.  
However, if anything this new prior is likely to make the  mixing  
worse rather than better. Two options that may speed up mixing are  
using slice=TRUE in the call to MCMCglmm. This will use slice sampling  
to update the latent variables rather then MH updates. You could also  
use parameter expanded priors for G, but from your output it does not  
look like the variance is hitting zero so it is unlikely to improve  
things.

Cheers,

Jarrod
On 30 Aug 2010, at 21:37, David Atkins wrote:

            

  
    
#
Jarrod--

Per usual, thanks for the input; I've got the Gelman et al. (2008) 
article and have some models running.  I'll update with what I find.

cheers, Dave

Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)		
1100 NE 45th Street, Suite 300 	
Seattle, WA  98105 	
206-616-3879 	
http://depts.washington.edu/cshrb/
(Mon-Wed)	

Center for Healthcare Improvement, for Addictions, Mental Illness,
   Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104
http://www.chammp.org
(Thurs)
On 8/31/10 2:43 AM, Jarrod Hadfield wrote: