Dear Ben,
Many thanks for this - that's great!
With the code you mention I get an NA though in the section that extracts the variance
bcolvar <- (fixef(.)["colony"])^2 :
If I try
gmm2<- glmer(trait~colony+(1|patriline),
family = binomial(link="logit"), data = dataset )
vcor=VarCorr(gmm2) # variance components
bpatrvar=vcor[["patriline"]]
bcolvar <- (fixef(gmm2)["colony"])^2 ## difference between colonies, squared, / (n-1) [n=2]
I get an NA for the bcolvar, since fixef(gmm2) returns me
fixef(gmm2)
(Intercept) colony2
0.013433529 0.004875263
Does that mean that I would have to use bcolvar <- (fixef(gmm2)["colony2"])^2 ? Or did you use other types of contrasts or something like that?
Also a couple of remaining questions:
1) Is there any way / any model / other approach (eg using a GEE?) that would allow for negative intraclass correlations? (right now ICCs are always positive, but in rare cases one might expect negative intraclass correlations)
2) If I wanted to allow patriline effects to be different for different colonies, would the appropriate model then be
gmm<- glmer(trait~colony+(colony|patriline), family = binomial(link="logit"), data = dataset ) or would it be
gmm<- glmer(trait~colony+(0+colony|patriline), family = binomial(link="logit"), data = dataset )?
3) If I want to treat all my factors as fixed, ie using a nested model as in
gm <- glm(trait~colony+patriline, family = binomial(link="logit"), data = dataset ) # patriline is in fact colony:patriline since all patrilines have unique codes
how would I extract the variance components then?
coef(gm) gives me
(Intercept) colony2 patriline2 patriline3 patriline4 patriline5 patriline6 patriline7 patriline8 patriline9 patriline10 patriline11
0.4054651 -16.9715335 -0.3001046 -0.5007753 -1.7917595 -0.4054651 0.6931472 -0.6931472 -16.9715335 1.2039728 -1.5040774 -0.8109302
patriline12 patriline13 patriline14 patriline15 patriline16 patriline17 patriline18 patriline19 patriline20 patriline21 patriline22
1.3862944 16.2295962 15.7776111 17.4823591 17.7700412 16.3429249 17.0768940 17.6646807 16.4860257 16.5660684 NA
But how could I get variance components from these?
And is there any analogous function as bootMer for glm's? I guess it is not possible to analyse fixed effect models with lme4 by any chance? (if so I could still use bootMer perhaps)
Or would I have to do parametric bootstrapping using simulate() myself then?
I'll also try with GenABEL::polygenic_hglm, as suggested, and with MCMCglmm. Also trying with nonparametric bootstrapping to get confidence limits (by resampling both colonies and patrilines - resampling individuals within patrilines doesn't seem to work though as that messes up the intraclass correlation), and have been simulating datasets with an ICC of 1, to see how the different methods perform - I'll report back here as soon as I have the results...
Cheers,
Tom
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: 27 August 2013 17:06
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE
Tom Wenseleers <Tom.Wenseleers at ...> writes:
Dear all,
Further to my previous message it appears that there might be a
problem with the case where you set use.u=T (to also simulate random
effects) in bootMer.
This appears to be the case judging from the fact that when I try
boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=T,type="parametric")
# parametric bootstrapping
mean(boo2$t) # mean from bootstrap runs = 0.03165166
the mean from the parametric bootrap runs (0.03165166) is much lower
than that obtained from the original model (0.095):
ICC1 <- ICCfun(gmm)
ICC1 # intraclass correlation=0.095
I then tried coding colony as a fixed factor, but that fails
altogether, irrespective of whether one sets use.u to TRUE or FALSE:
This failure occurs because when colony is a fixed factor, there is no "colony" component, and no "patriline:colony" component. See below where I get approximately the same estimate of ICC by computing the mean square difference among colonies from the fixed-effect parameter.
I don't know exactly what's going on with the bootstrap distributions of ICC, but my guess is that it's *not* a bug in the implementation of bootMer (although I could always be wrong).
I think it's some combination of bias in the estimate and possibly Jensen's inequality in the computation of the ICC (i.e.
even if the estimates of the parameters are unbiased, the estimate of the ICC won't necessarily be ...)
I would be curious how the results from MCMCglmm, which should have a different set of biases, or the results from a simulation study with known ICC, would line up with this example ...
gmm <- glmer(trait~(1|colony/patriline),
family = binomial(link="logit"), data = dataset )
gmm2<- glmer(trait~colony+(1|patriline),
family = binomial(link="logit"), data = dataset )
## And estimated the intraclass correlation using:
ICCfun <- function(.,type=1) {
vcor=VarCorr(.) # variance components
if (type==1) {
bpatrvar=vcor[["patriline:colony"]] # among patriline(colony) variance
bcolvar=vcor[["colony"]] # among colony variance
} else {
bpatrvar=vcor[["patriline"]]
## difference between colonies, squared, / (n-1) [n=2]
bcolvar <- (fixef(.)["colony"])^2
}
if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3
## # error variance is (pi^2)/3 for
## binomial logit model, and 1 for binomial probit model
## (Snijders & Bosker 1999)
c(bpatrvar/(bpatrvar+bcolvar+errvar)) # intraclass correlation }
## wrapper
ICCfun2 <- function(.) ICCfun(.,type=2)
(ICC1 <- ICCfun(gmm))
set.seed(101)
boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=TRUE,
.progress="txt",PBargs=list(style=3))
mean(boo2$t) # mean from bootstrap runs = 0.03165166
head(boo2d <- as.data.frame(boo2))
hist(boo2d[[1]],breaks=50,col="gray")
(ICC1 <- ICCfun(gmm) )
(ICC2 <- ICCfun2(gmm2))
boo3A <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=TRUE)
hist(as.data.frame(boo3A)[[1]],breaks=50,col="gray")
boo3B <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=FALSE)
hist(as.data.frame(boo3B)[[1]],breaks=50,col="gray")
dotplot(ranef(gmm2))
qqnorm(ranef(gmm2)[[1]][,1])
library(ggplot2)
d <- rbind(data.frame(w="use.u_TRUE",x=as.data.frame(boo3A)[[1]]),
data.frame(w="use.u_FALSE",x=as.data.frame(boo3B)[[1]]))
ggplot(d,aes(x,fill=w))+geom_histogram(position="identity",alpha=0.4)+
theme_set(theme_bw())
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Ben,
Many thanks for this - that's great!
With the code you mention I get an NA
though in the section that extracts the variance
bcolvar <- (fixef(.)["colony"])^2 :
If I try
gmm2<- glmer(trait~colony+(1|patriline),
family = binomial(link="logit"), data = dataset )
vcor=VarCorr(gmm2) # variance components
bpatrvar=vcor[["patriline"]]
bcolvar <- (fixef(gmm2)["colony"])^2
I get an NA for the bcolvar, since fixef(gmm2) returns me
fixef(gmm2)
(Intercept) colony2
0.013433529 0.004875263
Does that mean that I would have to use
bcolvar <- (fixef(gmm2)["colony2"])^2 ? Or did you use other types of
contrasts or something like that?
Yes, that seems to be a typo/thinko on my part: I'm not sure why
my code worked in the first place -- probably because I failed to
convert 'colony' to a factor (which wouldn't matter to the estimate
since colony only has two levels).
Also a couple of remaining questions:
1) Is there any way / any model / other approach
(eg using a GEE?) that would allow for negative intraclass
correlations? (right now ICCs are always positive,
but in rare cases one might expect negative
intraclass correlations)
Don't know. In principle, one way to do it is to allow compound
symmetric structures rather than using grouping variables at a
level of individual latent effects (i.e. if this were a LMM fitted
via nlme::lme you would use pdCompSymm to structure the random effects).
This seems as though it should be possible via lme4 eventually -- we
have a *very* bleeding-edge development 'flexLambda' branch on
github -- but holding your breath would be a bad idea.
2) If I wanted to allow patriline effects to be different for
different colonies, would the appropriate
model then be
gmm<- glmer(trait~colony+(colony|patriline),
family = binomial(link="logit"), data = dataset ) or
would it be
gmm<- glmer(trait~colony+(0+colony|patriline),
family = binomial(link="logit"), data = dataset )?
Do you mean different among-patriline variances for each colony?
I'd have to think about it more carefully, but I don't think either
of these will work; I think you may need the finer level of control
that nlme offers (which you could use via MASS::glmmPQL, although there are
definitely concerns about the actual definitions of the models being
fitted in that way). There was a thread about this:
http://thread.gmane.org/gmane.comp.lang.r.lme4.devel/205/focus=207
I don't remember the details.
3) If I want to treat all my factors as fixed, ie using a nested model as in
gm <- glm(trait~colony+patriline, family = binomial(link="logit"),
data = dataset ) # patriline is in
fact colony:patriline since all patrilines have unique codes
how would I extract the variance components then?
In principle this should work:
## patrilines all have unique numbers
## I then fitted the hierarchically nested binomial mixed
gmm <- glmer(trait~(1|colony/patriline),
family = binomial(link="logit"), data = dataset )
gmm2<- glmer(trait~colony+(1|patriline),
family = binomial(link="logit"), data = dataset )
gmm3<- glm(trait~colony+patriline,
family = binomial(link="logit"), data = dataset,
contrasts=list(patriline=contr.sum,colony=contr.sum))
## And estimated the intraclass correlation using:
ICCfun <- function(.,type=1,verbose=FALSE) {
if (type<3) vcor=VarCorr(.)
if (type==1) {
bpatrvar=vcor[["patriline:colony"]] # among patriline(colony) variance
bcolvar=vcor[["colony"]] # among colony variance
} else if (type==2) {
bpatrvar=vcor[["patriline"]]
## difference between colonies, squared, / (n-1) [n=2]
bcolvar <- (fixef(.)["colony2"])^2
} else if (type==3) {
f <- coef(.)
pp <- na.omit(f[grep("^patriline",names(f))])
cp <- f[grep("^colony",names(f))]
## variance based on contrasts (n rather than n-1)
vfun <- function(x) sum(x^2)/length(x)
bpatrvar <- vfun(pp)
bcolvar <- vfun(cp)
}
## error variance is (pi^2)/3 for
## binomial logit model, and 1 for binomial probit model
## (Snijders & Bosker 1999)
if (family(.)$link=="probit") errvar=1 else errvar=(pi^2)/3
r <- c(bpatrvar/(bpatrvar+bcolvar+errvar)) # intraclass correlation
if (verbose) cat(bpatrvar,bcolvar,errvar,r,"\n")
r
}
ICCfun2 <- function(.,...) ICCfun(.,type=2,...)
ICCfun3 <- function(.,...) ICCfun(.,type=3,...)
ICC1 <- ICCfun(gmm,verbose=TRUE)
ICC2 <- ICCfun2(gmm2,verbose=TRUE)
ICC3 <- ICCfun3(gmm3,verbose=TRUE)
... but at the moment I'm not getting sensible answers,
so I don't know if I'm just doing something wrong, or whether
the shrinkage due to treating patriline as a random effect is
really so strong ...
And is there any analogous function as bootMer for glm's?
I guess it is not possible to analyse fixed effect
models with lme4 by any chance?
No.
Or would I have to do parametric bootstrapping using simulate() myself then?
Yes: that should be easy.
sumFun(update(model,data=simulate(model)))
should give you a PB replicate.
Also a couple of remaining questions:
1) Is there any way / any model / other approach (eg using a GEE?)
that would allow for negative intraclass correlations? (right now ICCs
are always positive, but in rare cases one might expect negative
intraclass correlations)
Don't know. In principle, one way to do it is to allow compound symmetric structures rather than using grouping >variables at a level of individual latent effects (i.e. if this were a LMM fitted via nlme::lme you would use pdCompSymm >to structure the random effects).
I've been working through Stroup's new GLMM book, and he shows an example of using a marginal model with compound symmetry structure to allow for a negative covariance that gives equivalent results to allowing for a negative variance in the conditional model. That's for a Gaussian example, though. Switching to a marginal model for a non-Gaussian case is a lot more complex and would take a lot of thought (for me) to make sure I wanted the marginal estimates. I definitely don't know if it makes sense to use the covariance matrix for a binomial marginal model for ICC or not.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models