Dear List,
I'm using MCMCglmm to estimate logistic mixed models for social network
data. Estimation works fine until I add a third random effect that
causes MCMC autocorrelations on all parameters to become extremely high.
I'm trying to figure out if there is any way to lower autocorrelation
values while retaining the same specification for the fixed and random
part of the model.
Here is some information about the model:
* Units of analysis are social ties between individuals, i.e. each row
in the data frame is one unique social tie.
* Each tie exists between 2 individuals, an Ego and and Alter. Each Ego
belongs to a Family.
* The binary response is whether the tie is "supportive", i.e. if Alter
provides support to Ego (1=Yes, 0=No).
* The model includes 3 random effects (aka classifications): Egos,
Alters, Families. Egos are nested within Families (each Ego belongs to
one and one only Family). Egos and Alters are cross-classified: the same
Ego has ties with multiple Alters and the same Alter has ties with
multiple Egos. Each tie "belongs" to one Ego, one Family (Ego's Family),
and one Alter, so no multiple membership occurs.
* The data include 1200 ties (1200 dataset rows). There are about 20
ties for each Ego; 40 ties for each Family; and 2 ties for each Alter.
* This is a fairly simple random-intercept logistic model: The intercept
has a random component for Egos, Families, and Alters. So I'm interested
in ego-level, family-level, alter-level variance in the intercept. No
random slope is involved.
* The model includes covariates at the tie, ego, and alter level (i.e.
characteristics of ties, egos, and alters).
This is how I'm estimating the model:
```
# Priors
priors.m4b <- list(R= list(V=1,fix=1),
G= list(G1=list(V=1, nu=0.002), G2=list(V=1,
n=0.002), G3=list(V=1, n=0.002)))
# MCMCglmm
MCMCglmm(tie.supportive ~ 1 + tie.family + tie.trust + tie.communication
+ tie.same.sex + tie.same.age + ego.sex + ego.age + alter.sex +
alter.age, random= ~ egoID + ego.familyID + alterID,
family="categorical", data= df, prior= priors.m4b, slice=TRUE,
nitt=710000, thin=700, burnin= 10000)
```
The estimation results have autocorrelation values of over .50,
sometimes as high as .90 for the fixed effects; and over .70 for the
variance components, resulting in uselessly low effective sample sizes.
However, the problem only occurs when adding the Alter random effect
(alterID). The same model with just the Ego and Family random effects is
estimated well, with low autocorrelation values. Unfortunately, the main
aim of the analysis is precisely to examine the model with the Alter
random effect.
Also, the problem seems to be reduced when keeping the Alter random
effect while removing some tie-level (i.e. level-1) covariates, in
particular tie.trust and tie.communication. These are tie-level measures
of how much trust and communication there is on that tie (i.e. between
that Ego and that Alter) on a 1-to-5 scale (I'm treating these as
continuous and centered variables).
The questions on which I wanted to ask your thoughts are the following:
* Am I specifying the model in MCMCglmm in the right way? I have read
Jarrod Hadfield's Course Notes and a number of other tutorials, but
obviously I'm using MCMCglmm on a different type of model and data than
presented in most tutorials, so I'm still not 100% sure.
* What might be the problem with the Alter random effect, that is
causing autocorrelations to suddenly rocket in this way?
* What might be the problem with tie.trust and tie.communication, that
causes autocorrelations to be way lower when these two level-1
covariates are removed?
* Do you have any suggestions on how to deal with such high
autocorrelations? Is changing the model the only solution? I've tried to
increase the nitt and thin (up to nitt=1210000, thin=1200,
burnin=10000), but autocorrelations seem to stay above .50 or .60.
Thank you very much for any thought, comment or suggestion.
Raffaele Vacca
Department of Sociology and Criminology & Law
University of Florida
Dealing with very high autocorrelations in logistic MCMCglmm
2 messages · Raffaele Vacca, Nicolas Fanin
Dear List,
I need your advices about data analysis (on which I working on since a long time), thank you for your responses.
I'm working on a split plot design with repeated measurements over time.
It consists of 3 Ecosystem types (large, medium and small), each ecosystem being repeated 10 times (30 ecosystems in total, each having an individual ID = EcosystemCode).
Within each ecosystem, we have 8 Treatments in which we measured plant biomass during 20 years.
I would like to know the effect of Ecosystem type, Treatments (and eventually Time).
If we consider that we measure the same plots over time (and the data are non-independant) each plot within each island being a distinct treatment (and are equivalent for the model):
modelfinal1 = lme (Biomass ~ EcosystemType*Tr, random = ~1|EcosystemCode/Plot, data = XXX)
And if I want to know the effect of Year:
modelfinal2 = lme (Biomass ~ EcosystemType*Year*Tr, random = ~1|EcosystemCode/Plot, data = XXX)
In the first model, I get a denDF of 27 for EcosystemType (that is ok given the 30 ecosystems) and denDF of 149 for Tr or the interaction between Tr x Ecosystem type
In the second model I get the same denDF for Ecosystem type and Tr, but I get a huge denDF of about 2992 for Time and the different interactions involving time.
What do you think about the degree of freedom? Do we overstate the degrees of freedom associated with the effect of Time? Do you know any multivariate approaches with fewer assumptions about variance/covariance structure that would be a better approach than the linear models? (the model are linear but sometimes vary towards log10-relationships)
Then, I tried to assess the differences between slopes representing the evolution of biomass over time in each of the 3 ecosystem types using mixed models (and contrast.matrix ):
modelfinal4 = lme (Biomass ~ EcosystemType*Year , random=~1|EcosystemCode, data = XXX)
contrast.matrix <- rbind(`Year:EcosystemTypelarge vs. Year:EcosystemTypemedium` = c(0, 0, 0, 0, 1, 0),
`Year:EcosystemTypelarge vs. Year:EcosystemTypesmall` = c(0, 0, 0, 0, 0, 1),
`Year:EcosystemTypemedium vs. Year:EcosystemTypesmall` = c(0, 0, 0, 0, -1, 1))
slopecomps <- glht(modelfinal4, contrast.matrix)
summary(slopecomps)
What do you think about this approach? Do you know any other (and may be more common) to compare slopes using mixed models?
Finally, I tried to get the R2mar and R2cond of the different using the paper of Nakagawa and Schielzeth 2013, for example:
mF= lmer (Totalbiomass ~ RealizedSR + (1|EcosystemCode), data = XXX)
anova(mF)
mFX <- model.matrix(mF)
Fixed <- fixef(mF)[2] * mFX[, 2]
VarF <- var(Fixed)
VarF <- var(as.vector(fixef(mF) %*% t(mFX)))
VarF/(VarF + VarCorr(mF)$Island[1] + attr(VarCorr(mF), "sc")^2)
(VarF + VarCorr(mF)$EcosystemCode[1])/(VarF + VarCorr(mF)$EcosystemCode[1] + (attr(VarCorr(mF), "sc")^2))
Do you know any package that can do te calculation directly for many variables at the same time?
Thank you for you responses once again, and I wish you a good week-end,
Best Regards,
Nicolas