Skip to content

Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

3 messages · Tom Wenseleers, Ben Bolker, Muldoon, Ariel

#
Dear Ben,
Many thanks for this - that's great! 
With the code you mention I get an NA though in the section that extracts the variance 
bcolvar <- (fixef(.)["colony"])^2  :

If I try
gmm2<- glmer(trait~colony+(1|patriline), 
    family = binomial(link="logit"), data = dataset )
vcor=VarCorr(gmm2)  # variance components
bpatrvar=vcor[["patriline"]]
bcolvar <- (fixef(gmm2)["colony"])^2    ## difference between colonies, squared, / (n-1) [n=2]

I get an NA for the bcolvar, since fixef(gmm2) returns me
(Intercept)     colony2 
0.013433529 0.004875263

Does that mean that I would have to use bcolvar <- (fixef(gmm2)["colony2"])^2 ? Or did you use other types of contrasts or something like that?

Also a couple of remaining questions:
1) Is there any way / any model / other approach (eg using a GEE?) that would allow for negative intraclass correlations? (right now ICCs are always positive, but in rare cases one might expect negative intraclass correlations)

2) If I wanted to allow patriline effects to be different for different colonies, would the appropriate model then be
gmm<- glmer(trait~colony+(colony|patriline), family = binomial(link="logit"), data = dataset ) or would it be
gmm<- glmer(trait~colony+(0+colony|patriline), family = binomial(link="logit"), data = dataset )?

3) If I want to treat all my factors as fixed, ie using a nested model as in
gm <- glm(trait~colony+patriline, family = binomial(link="logit"), data = dataset )                      # patriline is in fact colony:patriline since all patrilines have unique codes
how would I extract the variance components then?
coef(gm) gives me
(Intercept)     colony2  patriline2  patriline3  patriline4  patriline5  patriline6  patriline7  patriline8  patriline9 patriline10 patriline11 
  0.4054651 -16.9715335  -0.3001046  -0.5007753  -1.7917595  -0.4054651   0.6931472  -0.6931472 -16.9715335   1.2039728  -1.5040774  -0.8109302 
patriline12 patriline13 patriline14 patriline15 patriline16 patriline17 patriline18 patriline19 patriline20 patriline21 patriline22 
  1.3862944  16.2295962  15.7776111  17.4823591  17.7700412  16.3429249  17.0768940  17.6646807  16.4860257  16.5660684          NA
But how could I get variance components from these?
And is there any analogous function as bootMer for glm's? I guess it is not possible to analyse fixed effect models with lme4 by any chance? (if so I could still use bootMer perhaps)
Or would I have to do parametric bootstrapping using simulate() myself then?

I'll also try with GenABEL::polygenic_hglm, as suggested, and with MCMCglmm. Also trying with nonparametric bootstrapping to get confidence limits (by resampling both colonies and patrilines - resampling individuals within patrilines doesn't seem to work though as that messes up the intraclass correlation), and have been simulating datasets with an ICC of 1, to see how the different methods perform - I'll report back here as soon as I have the results...

Cheers,
Tom

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: 27 August 2013 17:06
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Tom Wenseleers <Tom.Wenseleers at ...> writes:
This failure occurs because when colony is a fixed factor, there is no "colony" component, and no "patriline:colony" component.  See below where I get approximately the same estimate of ICC by computing the mean square difference among colonies from the fixed-effect parameter.

   I don't know exactly what's going on with the bootstrap distributions of ICC, but my guess is that it's *not* a bug in the implementation of bootMer (although I could always be wrong).
I think it's some combination of bias in the estimate and possibly Jensen's inequality in the computation of the ICC (i.e.
even if the estimates of the parameters are unbiased, the estimate of the ICC won't necessarily be ...)

  I would be curious how the results from MCMCglmm, which should have a different set of biases, or the results from a simulation study with known ICC, would line up with this example ...

gmm <- glmer(trait~(1|colony/patriline), 
    family = binomial(link="logit"), data = dataset )
gmm2<- glmer(trait~colony+(1|patriline), 
    family = binomial(link="logit"), data = dataset )
 
## And estimated the intraclass correlation using:
ICCfun <- function(.,type=1) {
    vcor=VarCorr(.)  # variance components
    if (type==1) {
        bpatrvar=vcor[["patriline:colony"]]  #  among patriline(colony) variance
        bcolvar=vcor[["colony"]]  # among colony variance
    } else {
        bpatrvar=vcor[["patriline"]]
        ## difference between colonies, squared, / (n-1) [n=2]
        bcolvar <- (fixef(.)["colony"])^2  
    }
    if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3
    ## # error variance is (pi^2)/3 for
    ## binomial logit model, and 1 for binomial probit model
    ##  (Snijders & Bosker 1999)
    c(bpatrvar/(bpatrvar+bcolvar+errvar)) # intraclass correlation }

## wrapper
ICCfun2 <- function(.) ICCfun(.,type=2)

(ICC1 <- ICCfun(gmm))

set.seed(101)
boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=TRUE,
                .progress="txt",PBargs=list(style=3))
mean(boo2$t) # mean from bootstrap runs = 0.03165166

head(boo2d <- as.data.frame(boo2))
hist(boo2d[[1]],breaks=50,col="gray")

(ICC1 <- ICCfun(gmm) )
(ICC2 <- ICCfun2(gmm2))
boo3A <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=TRUE)
hist(as.data.frame(boo3A)[[1]],breaks=50,col="gray")

boo3B <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=FALSE)
hist(as.data.frame(boo3B)[[1]],breaks=50,col="gray")

dotplot(ranef(gmm2))
qqnorm(ranef(gmm2)[[1]][,1])

library(ggplot2)
d <- rbind(data.frame(w="use.u_TRUE",x=as.data.frame(boo3A)[[1]]),
           data.frame(w="use.u_FALSE",x=as.data.frame(boo3B)[[1]]))
ggplot(d,aes(x,fill=w))+geom_histogram(position="identity",alpha=0.4)+
    theme_set(theme_bw())

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
#
Tom Wenseleers <Tom.Wenseleers at ...> writes:
Yes, that seems to be a typo/thinko on my part: I'm not sure why
my code worked in the first place -- probably because I failed to
convert 'colony' to a factor (which wouldn't matter to the estimate
since colony only has two levels).
Don't know.  In principle, one way to do it is to allow compound
symmetric structures rather than using grouping variables at a
level of individual latent effects (i.e. if this were a LMM fitted
via nlme::lme you would use pdCompSymm to structure the random effects).
This seems as though it should be possible via lme4 eventually -- we
have a *very* bleeding-edge development 'flexLambda' branch on 
github -- but holding your breath would be a bad idea.
Do you mean different among-patriline variances for each colony?
   I'd have to think about it more carefully, but I don't think either
of these will work; I think you may need the finer level of control
that nlme offers (which you could use via MASS::glmmPQL, although there are
definitely concerns about the actual definitions of the models being
fitted in that way).  There was a thread about this:
http://thread.gmane.org/gmane.comp.lang.r.lme4.devel/205/focus=207
I don't remember the details.
In principle this should work:

## patrilines all have unique numbers
## I then fitted the hierarchically nested binomial mixed
gmm <- glmer(trait~(1|colony/patriline), 
   family = binomial(link="logit"), data = dataset )
gmm2<- glmer(trait~colony+(1|patriline), 
   family = binomial(link="logit"), data = dataset )
gmm3<- glm(trait~colony+patriline, 
    family = binomial(link="logit"), data = dataset,
           contrasts=list(patriline=contr.sum,colony=contr.sum))
 
## And estimated the intraclass correlation using:
ICCfun <- function(.,type=1,verbose=FALSE) {
    if (type<3) vcor=VarCorr(.) 
    if (type==1) {
        bpatrvar=vcor[["patriline:colony"]]  #  among patriline(colony) variance
        bcolvar=vcor[["colony"]]  # among colony variance
    } else if (type==2) {
        bpatrvar=vcor[["patriline"]]
        ## difference between colonies, squared, / (n-1) [n=2]
        bcolvar <- (fixef(.)["colony2"])^2 
    } else if (type==3) {
        f <- coef(.)
        pp <- na.omit(f[grep("^patriline",names(f))])
        cp <- f[grep("^colony",names(f))]
        ## variance based on contrasts (n rather than n-1)
        vfun <- function(x) sum(x^2)/length(x)
        bpatrvar <- vfun(pp)
        bcolvar <- vfun(cp)
    }
    ## error variance is (pi^2)/3 for
    ## binomial logit model, and 1 for binomial probit model 
    ## (Snijders & Bosker 1999)
    if (family(.)$link=="probit") errvar=1 else errvar=(pi^2)/3 
    r <- c(bpatrvar/(bpatrvar+bcolvar+errvar)) # intraclass correlation
    if (verbose) cat(bpatrvar,bcolvar,errvar,r,"\n")
    r
}

ICCfun2 <- function(.,...) ICCfun(.,type=2,...)
ICCfun3 <- function(.,...) ICCfun(.,type=3,...)

ICC1 <- ICCfun(gmm,verbose=TRUE)
ICC2 <- ICCfun2(gmm2,verbose=TRUE)
ICC3 <- ICCfun3(gmm3,verbose=TRUE)

  ... but at the moment I'm not getting sensible answers,
so I don't know if I'm just doing something wrong, or whether
the shrinkage due to treating patriline as a random effect is
really so strong ...
No.
Yes: that should be easy.

  sumFun(update(model,data=simulate(model)))

 should give you a PB replicate.
#
Tom Wenseleers <Tom.Wenseleers at ...> writes:
I've been working through Stroup's new GLMM book, and he shows an example of using a marginal model with compound symmetry structure to allow for a negative covariance that gives equivalent results to allowing for a negative variance in the conditional model.  That's for a Gaussian example, though.  Switching to a marginal model for a non-Gaussian case is a lot more complex and would take a lot of thought (for me) to make sure I wanted the marginal estimates.  I definitely don't know if it makes sense to use the covariance matrix for a binomial marginal model for ICC or not.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models