Dealing with very high autocorrelations in logistic MCMCglmm - R-SIG-mixed-models

Thu, May 11, 2017 2:22 AM #

Dear List,

I'm using MCMCglmm to estimate logistic mixed models for social network 
data. Estimation works fine until I add a third random effect that 
causes MCMC autocorrelations on all parameters to become extremely high. 
I'm trying to figure out if there is any way to lower autocorrelation 
values while retaining the same specification for the fixed and random 
part of the model.

Here is some information about the model:
* Units of analysis are social ties between individuals, i.e. each row 
in the data frame is one unique social tie.
* Each tie exists between 2 individuals, an Ego and and Alter. Each Ego 
belongs to a Family.
* The binary response is whether the tie is "supportive", i.e. if Alter 
provides support to Ego (1=Yes, 0=No).
* The model includes 3 random effects (aka classifications): Egos, 
Alters, Families. Egos are nested within Families (each Ego belongs to 
one and one only Family). Egos and Alters are cross-classified: the same 
Ego has ties with multiple Alters and the same Alter has ties with 
multiple Egos. Each tie "belongs" to one Ego, one Family (Ego's Family), 
and one Alter, so no multiple membership occurs.
* The data include 1200 ties (1200 dataset rows). There are about 20 
ties for each Ego; 40 ties for each Family; and 2 ties for each Alter.
* This is a fairly simple random-intercept logistic model: The intercept 
has a random component for Egos, Families, and Alters. So I'm interested 
in ego-level, family-level, alter-level variance in the intercept. No 
random slope is involved.
* The model includes covariates at the tie, ego, and alter level (i.e. 
characteristics of ties, egos, and alters).

This is how I'm estimating the model:
```
# Priors
priors.m4b <- list(R= list(V=1,fix=1),
                    G= list(G1=list(V=1, nu=0.002), G2=list(V=1, 
n=0.002), G3=list(V=1, n=0.002)))

# MCMCglmm
MCMCglmm(tie.supportive ~ 1 + tie.family + tie.trust + tie.communication 
+ tie.same.sex + tie.same.age + ego.sex + ego.age + alter.sex + 
alter.age, random= ~ egoID + ego.familyID + alterID, 
family="categorical", data= df, prior= priors.m4b, slice=TRUE, 
nitt=710000, thin=700, burnin= 10000)
```

The estimation results have autocorrelation values of over .50, 
sometimes as high as .90 for the fixed effects; and over .70 for the 
variance components, resulting in uselessly low effective sample sizes.
However, the problem only occurs when adding the Alter random effect 
(alterID). The same model with just the Ego and Family random effects is 
estimated well, with low autocorrelation values. Unfortunately, the main 
aim of the analysis is precisely to examine the model with the Alter 
random effect.
Also, the problem seems to be reduced when keeping the Alter random 
effect while removing some tie-level (i.e. level-1) covariates, in 
particular tie.trust and tie.communication. These are tie-level measures 
of how much trust and communication there is on that tie (i.e. between 
that Ego and that Alter) on a 1-to-5 scale (I'm treating these as 
continuous and centered variables).

The questions on which I wanted to ask your thoughts are the following:
* Am I specifying the model in MCMCglmm in the right way? I have read 
Jarrod Hadfield's Course Notes and a number of other tutorials, but 
obviously I'm using MCMCglmm on a different type of model and data than 
presented in most tutorials, so I'm still not 100% sure.
* What might be the problem with the Alter random effect, that is 
causing autocorrelations to suddenly rocket in this way?
* What might be the problem with tie.trust and tie.communication, that 
causes autocorrelations to be way lower when these two level-1 
covariates are removed?
* Do you have any suggestions on how to deal with such high 
autocorrelations? Is changing the model the only solution? I've tried to 
increase the nitt and thin (up to nitt=1210000, thin=1200, 
burnin=10000), but autocorrelations seem to stay above .50 or .60.

Thank you very much for any thought, comment or suggestion.

Raffaele Vacca
Department of Sociology and Criminology & Law
University of Florida

Nicolas Fanin

Thu, May 11, 2017 8:42 AM #

Dear List,


I need your advices about data analysis (on which I working on since a long time), thank you for your responses.


I'm working on a split plot design with repeated measurements over time.

It consists of 3 Ecosystem types (large, medium and small), each ecosystem being repeated 10 times (30 ecosystems in total, each having an individual ID = EcosystemCode).

Within each ecosystem, we have 8 Treatments in which we measured plant biomass during 20 years.

I would like to know the effect of Ecosystem type, Treatments (and  eventually Time).


If we consider that we measure the same plots over time (and the data are non-independant) each plot within each island being a distinct treatment (and are equivalent for the model):


modelfinal1 = lme (Biomass ~ EcosystemType*Tr, random = ~1|EcosystemCode/Plot, data = XXX)


And if I want to know the effect of Year:


modelfinal2 = lme (Biomass ~ EcosystemType*Year*Tr, random = ~1|EcosystemCode/Plot, data = XXX)

In the first model, I get a denDF of 27 for EcosystemType (that is ok given the 30 ecosystems) and denDF of 149 for Tr or the interaction between Tr x Ecosystem type
In the second model I get the same denDF for Ecosystem type and Tr, but I get a huge denDF of about 2992 for Time and the different interactions involving time.

What do you think about the degree of freedom? Do we overstate the degrees of freedom associated with the effect of Time? Do you know any  multivariate approaches with fewer assumptions about variance/covariance structure that would be a better approach than the linear models? (the model are linear but sometimes vary towards log10-relationships)



Then, I tried to assess the differences between slopes representing the evolution of biomass over time in each of the 3 ecosystem types using mixed models (and contrast.matrix ):


modelfinal4 = lme (Biomass  ~  EcosystemType*Year , random=~1|EcosystemCode, data = XXX)

contrast.matrix <- rbind(`Year:EcosystemTypelarge vs. Year:EcosystemTypemedium` = c(0, 0, 0, 0, 1, 0),
                         `Year:EcosystemTypelarge vs. Year:EcosystemTypesmall` = c(0, 0, 0, 0, 0, 1),
                         `Year:EcosystemTypemedium vs. Year:EcosystemTypesmall` = c(0, 0, 0, 0, -1, 1))

slopecomps <- glht(modelfinal4, contrast.matrix)
summary(slopecomps)

What do you think about this approach? Do you know any other (and may be more common) to compare slopes using mixed models?


Finally, I tried to get the R2mar and R2cond of the different using the paper of Nakagawa and Schielzeth 2013, for example:


mF= lmer (Totalbiomass ~ RealizedSR + (1|EcosystemCode), data = XXX)

anova(mF)
mFX <- model.matrix(mF)
Fixed <- fixef(mF)[2] * mFX[, 2]
VarF <- var(Fixed)
VarF <- var(as.vector(fixef(mF) %*% t(mFX)))
VarF/(VarF +  VarCorr(mF)$Island[1] + attr(VarCorr(mF), "sc")^2)
(VarF + VarCorr(mF)$EcosystemCode[1])/(VarF + VarCorr(mF)$EcosystemCode[1] + (attr(VarCorr(mF), "sc")^2))


Do you know any package that can do te calculation directly for many variables at the same time?

Thank you for you responses once again, and I wish you a good week-end,


Best Regards,


Nicolas