Skip to content

2 correlated random effects with quadrature?

5 messages · Ben Bolker, Joshua Wiley, David Duffy +1 more

#
Is there a way to fit generalized linear mixed model with 2 correlated
random effects in R, using quadrature?  At the moment, I'm only
concerned with binary outcomes.

When I try glmer from lme4 with the quadrature argument I get
Error: AGQ only defined for a single scalar random-effects term

Yes, I know 2 dimensional quadrature is slow.

Ross Boylaln
1 day later
#
Ross Boylan <ross at ...> writes:
I don't know offhand of an R package that will do this.  I'm pretty
sure AS-REML uses PQL (not even Laplace approximation): AD Model Builder can 
only do GHQ for nested/grouped models (i.e. not crossed) with a single
random effect per block.  As far as I know you're simply out of luck:
both GHQ and the ability to handle crossed random effects are fairly
rare among GLMM platforms, and the combination seems even rarer.
I presume you've (1) compared Laplace approximation to GHQ with simpler
examples and (2) compared Laplace approximation to 'truth' in simulations
and found it wanting in one or both cases?  One alternativepossibility 
for improving the quality of the approximation would be to use importance
sampling in AD Model Builder ...
#
Hi Ross,

I do not know any functionality for GHQ, but you already acknowledged
you were okay with slow, at which point, you can probably do better
with MCMC.  If you're not feeling like rolling your own, the MCMCglmm
package makes it quite easy.

Here is a (not particularly sensible) example:

require(MCMCglmm)
set.seed(1234)
dat <- mtcars[sample(1:32, 1000, replace = TRUE), ]
dat <- within(dat, {
  qsec <- scale(qsec)
  hp <- scale(hp)
  mpg <- scale(mpg)
  disp <- scale(disp)
})
dat$ID <- factor(rep(letters, length.out = 1000))
dat$cyl <- factor(dat$cyl)
# set seed and estimate model
set.seed(10)
m <- MCMCglmm(vs ~ qsec + mpg + drat, random = ~ us(1 + mpg):ID,
family = "categorical",
  data = dat, prior = list(
  B = list(mu = c(0, 0, 0, 0), V = diag(4) * 1e2),
  R = list(V = 1, fix = 1),
  G = list(G1 = list(V = diag(2), nu = 2))), pr=TRUE,
  nitt = 55000, thin = 20, burnin = 5000, verbose=FALSE)

# print a summary
summary(m)


 Iterations = 5001:54981
 Thinning interval  = 20
 Sample size  = 2500
 DIC: 22.87558
 G-structure:  ~us(1 + mpg):ID
                           post.mean l-95% CI u-95% CI eff.samp
(Intercept):(Intercept).ID   0.75374   0.1228    1.754   217.21
mpg:(Intercept).ID          -0.03048  -1.1675    0.922   108.96
(Intercept):mpg.ID          -0.03048  -1.1675    0.922   108.96
mpg:mpg.ID                   0.97473   0.1383    2.644    41.16
 R-structure:  ~units
      post.mean l-95% CI u-95% CI eff.samp
units         1        1        1        0
 Location effects: vs ~ qsec + mpg + drat
            post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)    2.4890  -8.4259  11.9981   40.374  0.652
qsec          18.9466  13.3597  23.8444    2.672 <4e-04 ***
mpg            8.2778   5.6246  11.2228    9.550 <4e-04 ***
drat          -0.3849  -2.9258   2.3827   40.419  0.800
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Now of course I did not run enough iterations, and probably have a low
thinning interval, you may want better priors, and if you are looking
to compare to standard logistic models, don't forget to rescale the
estimates (note the non zero variance used to improve mixing).

Still, it is relatively straightforward, highly stable given enough
time, flexible to have many effects, and gives you posteriors which
are great for inference.

Cheers,

Josh
On Mon, Mar 11, 2013 at 1:25 PM, Ross Boylan <ross at biostat.ucsf.edu> wrote:

  
    
#
On Wed, 13 Mar 2013, Ben Bolker wrote:

            
If you're happy with a probit link, it is pretty straightforward to hook 
mvtnorm up to a maximizer, especially if you have just the one problem to 
fit.  Your groups can't be too large unfortunately.

If you look in 
http://genepi.qimr.edu.au/staff/davidD/Sib-pair/Src/sib-pair.R

under varcomp.mft(), there is an example of two*N crossed correlated 
random effects in a standard genetics variance components model:

lik.mft.fam <- function(vc.data, m, va, vq, nthresh) {
   ln <- function(x) ifelse(x>0, log(x), 0)
   ndata <- length(vc.data$y)
   thresh <- matrix(NA, nr=ndata, nc=(nthresh+2))
   thresh[,1] <- -Inf
   thresh[,ncol(thresh)] <- Inf
   thresh[, -c(1, ncol(thresh))] <- vc.data$covariates %*% matrix(m, nr=1)
   S <- va*vc.data$rel + vq*vc.data$ibd + diag(1-va-vq,nrow(vc.data$rel))
   res <- ln(pmvnorm(lower=thresh[cbind(1:ndata,vc.data$y+1)],
                     upper=thresh[cbind(1:ndata,vc.data$y+2)],
                     corr=S))

where va and vq are the variance components for two classes of random 
effects (additive genetic and QTL, one each for each individual 
observation), and rel and ibd are the correlation matrices for the random 
effects (which we can specify using the pedigree structure and 
observed sharing of genetic markers).  This likelihood is maximized 
using optim().  For high dimensional integration, you need to tweak abseps 
and releps of the Quasi-Monte-Carlo algorithm.  It can do up to 1000 
dimensions, but your likelihood can bounce around a fair bit (because it 
is MC), which upsets your maximizer, but you'll get the right answer 
eventually.

Cheers, David Duffy.




| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v
#
On 3/12/2013 2:18 PM, Ben Bolker wrote:
I'm not sure if it matters, but the 2 random effects are both within the 
same cluster; they are for intercepts and slopes.  The clusters 
themselves are not crossed or nested.
Back to SAS nlmixed...  For some reason I'm having trouble piping 
results from R to SAS on Linux.
The main reason is that we want to compare the results with a more 
complicated model fit using 2-dimensional quadrature.  The more 
complicated model, in R, takes into account the sampling scheme that 
generated the data, which we are simulating.
Ross