MCMCglmm_2.13 - R-SIG-mixed-models

Wed, Jul 27, 2011 11:18 AM #

Dear Jarrod,

Can I please just confirm what you meant here by "covariance structures for the random effects"? If I understand correctly, this technique might be very useful for quite a lot of things--though I'm also trying to think it through.

Specifically, the nlme package for example allows for error structures like AR1, etc., in the *residuals* of a mixed model. Are you saying that MCMCglmm's "ginverse" function allows for similar sorts of structure to be modelled in the *random effects*? Consider:

library(nlme); library(MCMCglmm)

# models with uncorrelated random effects (16 Rats observed on 11 occasions):
m1 <- lme(weight ~ Time, random = ~ 1 | Rat, BodyWeight)
m2 <- MCMCglmm(weight ~ Time, random = ~ Rat, data=BodyWeight, verbose=F) # m1 and m2 are the same

# same model as the above, but allowing for AR1 residuals errors:
cs1AR1 <- corAR1(0.8, form = ~ Time | Rat)
cs1AR1 <- Initialize(cs1AR1, BodyWeight)
corMatrix(cs1AR1)[[1]] # shows the correlation structure for each Rat (all 16 are the same, since each Rat is measured on the same occasions)
m3 <- lme(weight ~ Time, random = ~ Time | Rat, correlation = cs1AR1, BodyWeight)

# now, for an illustrative experiment, treat *Time* as the random grouping factor, instead of Rat:
m4 <- lme(weight ~ Diet, random = ~ 1 | Time, BodyWeight)
m5 <- MCMCglmm(weight ~ Diet, random=~Time, data=BodyWeight, verbose=F, nitt=53000, pr=T)
# estimates for the variance of the random intercepts is somewhat different, but otherwise m4 and m5 yield similar results

# now, create a "ginverse" matrix, and incorporate it in a call to MCMCglmm:
mat <- as(corMatrix(cs1AR1)[[1]], "sparseMatrix")
rownames(mat) <- unique(BodyWeight$Time)
m6 <- MCMCglmm(weight ~ Diet, random=~Time, ginverse=list(Time=mat), data=BodyWeight, verbose=F, nitt=53000, pr=T)

The variance for the random effects for Time is now substantially reduced. Does this mean that the "estimates" for the U_j terms has taken into account their correlation, whereas m5 treated them as independent? (Maybe I should have inverted "mat" before including it in the call?)

Do you see what I'm getting at? Maybe I've not understood, but this seemed to raise the possibility of allowing for random effects to be structured in many interesting ways (serially in time, spatially, etc.), which I don't think other packages can handle?

Many thanks for any clarification, as always,
Malcolm

Jarrod Hadfield

Thu, Jul 28, 2011 2:20 AM #

Hi Malcolm,

I don't think it's as fancy as you think/hope. The covariance  
structure has to be seperable into a matrix of prameters (V) and a  
fixed matrix (A), whereas in many spatial/time-series models A is also  
a function of some parameter(s). It is flexible in the sense that A  
can be any positive-definite matrix.

I've edited your code below to get MCMCglmm to fit the AR1 model (with  
fixed phi).

Cheers,

Jarrod

On 27 Jul 2011, at 19:18, Malcolm Fairbrother wrote:

# The models below (m4,m5 & m6) are not ideal, so I'll ft a MCMCglmm  
model that is close to m3:

phi<-0.756197

# get phi from m3 - tried to extract this number from m3 but gave up  
looking!?

csEAR1 <- corAR1(phi, form = ~ Time | Rat)
csEAR1 <- Initialize(cs1AR1, BodyWeight)

mat <- as(corMatrix(csEAR1)[[1]], "sparseMatrix")

# generate AR1 covariance matrix

# IMPORTANT: the inverse needs to be passed, rather than the actual  
covariance matrix. This is preferable because the inverse is often  
sparser, and MCMCglmm may not "know" this from the matrix and a brute  
force solve. Your example is a good one, because the inverse of an AR1  
covaraince matrix  (when time points are ordered) only has non-zero  
elements along the diagonal and sub-diagonals.  However, inverting mat  
gives a dense matrix because of numerical inaccuracies:

length(solve(mat)@x)

# gives 121 non-zero elements, but should really be :

dim(mat)[1]+2*(dim(mat)[1]-1)

# 31 non-zero elements. So:

smat<-solve(mat)
smat<-band(smat,c(-1,1), c(1,-1))

# also, it makes more sense for this to define the autocovariance  
structure within each Rat (rather than across Rats) as in m3.  Since  
the deisgn is balanced this is easy to form:

invC<-kronecker(diag(nlevels(BodyWeight$Rat)), smat)
invC<-as(invC,  "CsparseMatrix")

# create a new variable with a unique name for each datum (and have  
the rownames in invC correspond)

rownames(invC) <- 1:dim(BodyWeight)[1]
BodyWeight$timeRat<-as.factor(1:dim(BodyWeight)[1])

# Then fit

prior=list(R=list(V=1, nu=0.02), G=list(G1=list(V=diag(2), nu=2,  
alpha.mu=c(0,0), alpha.V=diag(2)*1000), G2=list(V=1, nu=1, alpha.mu=0,  
alpha.V=1000)))

m3.mcmc <- MCMCglmm(weight ~ Time, random=~us(1+Time):Rat+timeRat,  
ginverse=list(timeRat=invC), data=BodyWeight, verbose=F, nitt=53000,  
pr=T, prior=prior)

# This is almost identical to m3 except the AR1 process is for the  
random effects rather than the residuals. In m3 the covariance  
structure within each Rat is equal to vr*mat where vr is the residual  
variance. In  MCMCglmm  it is  equal to   va*mat +vb*I  where va is  
the variance associated with the timeRat term and vb is the variance  
associated with the units term (residual variance).

# Nevertheless, the lme point estimates correspond very closely to the  
posterior modes for all parameters:

f1<-summary(m3)$coef$fixed[1]
f2<-summary(m3)$coef$fixed[2]
v1<-as.numeric(VarCorr(summary(m3))[1])
v2<-as.numeric(VarCorr(summary(m3))[2])
c12<-as.numeric(VarCorr(summary(m3))[,3][2])*sqrt(v1*v2)
vr<-as.numeric(VarCorr(summary(m3))[3])


par(mfrow=c(1,2))
hist(m3.mcmc$Sol[,1], breaks=30)
abline(v=f1, col="red")
hist(m3.mcmc$Sol[,2], breaks=30)
abline(v=f2, col="red")

par(mfrow=c(2,2))
hist(m3.mcmc$VCV[,1], breaks=30)
abline(v=v1, col="red")
hist(m3.mcmc$VCV[,4], breaks=30)
abline(v=v2, col="red")
hist(m3.mcmc$VCV[,2], breaks=30)
abline(v=c12, col="red")
hist(rowSums(m3.mcmc$VCV[,5:6]), breaks=30)
abline(v=vr, col="red")

# now, for an illustrative experiment, treat *Time* as the random  
grouping factor, instead of Rat:
m4 <- lme(weight ~ Diet, random = ~ 1 | Time, BodyWeight)
m5 <- MCMCglmm(weight ~ Diet, random=~Time, data=BodyWeight,  
verbose=F, nitt=53000, pr=T)
# estimates for the variance of the random intercepts is somewhat  
different, but otherwise m4 and m5 yield similar results

# now, create a "ginverse" matrix, and incorporate it in a call to  
MCMCglmm:
mat <- as(corMatrix(cs1AR1)[[1]], "sparseMatrix")
rownames(mat) <- unique(BodyWeight$Time)
m6 <- MCMCglmm(weight ~ Diet, random=~Time, ginverse=list(Time=mat),  
data=BodyWeight, verbose=F, nitt=53000, pr=T)

The variance for the random effects for Time is now substantially  
reduced. Does this mean that the "estimates" for the U_j terms has  
taken into account their correlation, whereas m5 treated them as  
independent? (Maybe I should have inverted "mat" before including it  
in the call?)

Do you see what I'm getting at? Maybe I've not understood, but this  
seemed to raise the possibility of allowing for random effects to be  
structured in many interesting ways (serially in time, spatially,  
etc.), which I don't think other packages can handle?

Many thanks for any clarification, as always,
Malcolm

Date: Mon, 25 Jul 2011 12:41:43 +0100
From: Jarrod Hadfield <j.hadfield at ed.ac.uk>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] MCMCglmm_2.13

Hi,

A new version of MCMCglmm (MCMCglmm_2.13) has been submitted to CRAN
and should be on-line soon.

NEW FEATURES:

a) Covariance structures for the random effects of the form
kronecker(V,A) where A is some arbitrary matrix can now be fitted.
Previous versions only allowed A matrices defined by a pedigree or
phylogeny, and even then only single phylogenies/pedigrees could be
passed. The A matrices (actually their sparse inverses) are passed  
via
the argument ginverse.

b) c. 10% increase in speed for phylogenetic and pedigree models.

BUG FIXES:

i) If a column sum of Z was zero the associated term could be
incorrectly omitted from the mixed model equations. This is unlikely
to affect the vast majority of users.

ii) If a prior was specified with too many structures then the
redundant prior structures were ignored. An error message is now  
issued.

iii) MCMCglmm failed with a cryptic error message if the internal
nodes of a phylogeny were labelled but not uniquely. A more helpful
error message is now given.

Cheers,

Jarrod

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Malcolm Fairbrother

Thu, Jul 28, 2011 2:17 PM #

Jarrod,

This is indeed very interesting, and plenty fancy enough. What I'm particularly intrigued by is the possibility of fitting models with *both* time-series and spatial correlation structures. The code below represents my attempt to do that. Does this make sense?


library(nlme); library(MCMCglmm)
cs1AR1 <- corAR1(0.2, form = ~ Time | Rat) # the value chosen here doesn't seem to affect the estimates much
cs1AR1 <- Initialize(cs1AR1, BodyWeight)
corMatrix(cs1AR1)[[1]] # shows the correlation structure for each Rat (all 16 are the same, since each Rat is measured on the same occasions)

# a revised version of what you proposed (which, if I've got this right, proves about 25% faster for this dataset, and is simpler to set up):

smat <- Matrix(band(solve(corMatrix(cs1AR1)[[1]]), -1, 1), sparse=T)
rownames(smat) <- unique(BodyWeight$Time)
BodyWeight$Time2 <- BodyWeight$Time # distinguishing between two different "Time" variables seems to prevent R from crashing when calling MCMCglmm...

prior=list(R=list(V=1, nu=0.02), G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0), alpha.V=diag(2)*1000), G2=list(V=1, nu=1, alpha.mu=0, alpha.V=1000)))

MC0 <- MCMCglmm(weight ~ as.numeric(Time), random= ~ us(1+Time):Rat + Time, data=BodyWeight, prior=prior)
MC1 <- MCMCglmm(weight ~ as.numeric(Time), random= ~ us(1+Time):Rat + Time2, ginverse=list(Time2=smat), data=BodyWeight, prior=prior)
# this approach takes advantage of the fact that the correlation structure is the same for each Rat
# it generates the same results as your m3.mcmc, though it reallocates some of the variance from your "timeRat" level to the units level
# but I think that's not a problem, because the units/residual variance can still be identified using va*mat+vb*I
# make sense?

# anyway, here's the (maybe) interesting bit, combining time-series and spatial autocorrelation:

BodyWeight$x <- sample(1:length(unique(BodyWeight$Rat)))[BodyWeight$Rat]
BodyWeight$y <- sample(1:length(unique(BodyWeight$Rat)))[BodyWeight$Rat]
plot(unique(BodyWeight$x), unique(BodyWeight$y)) # a map

cE <- corExp(form = ~ x + y | Time)
cE <- Initialize(cE, BodyWeight)
corMatrix(cE)[[1]] # a spatial correlation structure (albeit for data without any real spatial pattern)

smat2 <- Matrix(solve(corMatrix(cE)[[1]]), sparse=T) # not actually sparse, unfortunately
rownames(smat2) <- unique(BodyWeight$Rat)
BodyWeight$Rat2 <- BodyWeight$Rat # again, this seems necessary to prevent R from crashing when calling MCMCglmm

MC2 <- MCMCglmm(weight ~ as.numeric(Time), random= ~ us(1+Time):Rat + Time2, ginverse=list(Time2=smat, Rat2=smat2), data=BodyWeight, prior=prior)

Jarrod Hadfield

Fri, Jul 29, 2011 1:33 AM #

Hi Malcolm,

You can combine variance structures as you have done, but given some  
of the parameters of the variance structure (e.g. phi) have to be  
known, I think you're better off fitting these models in lme or ASReml  
where they are estimated.

The first Rat example using Time2 is not equivalent to the model I set  
up (or your lme model m3). In the original models there are  
covariances between observations within Rats, whereas in your  
specification you have covariances between observations within AND  
between Rats (i.e. You have COV(u_{11},u_{12}) =  COV(u_{11},u_{22}),  
where the first subscript is Rat and the second Time2, whereas before  
it COV(u_{11},u_{22})=0).

I'll take a look into the problem you were getting when time was used  
as a fixed covariate and a factor associated with a ginverse.

Cheers,

Jarrod





Quoting Malcolm Fairbrother <m.fairbrother at bristol.ac.uk> on Thu, 28  
Jul 2011 22:17:46 +0100:

Jarrod,

This is indeed very interesting, and plenty fancy enough. What I'm  
particularly intrigued by is the possibility of fitting models with  
*both* time-series and spatial correlation structures. The code  
below represents my attempt to do that. Does this make sense?


library(nlme); library(MCMCglmm)
cs1AR1 <- corAR1(0.2, form = ~ Time | Rat) # the value chosen here  
doesn't seem to affect the estimates much
cs1AR1 <- Initialize(cs1AR1, BodyWeight)
corMatrix(cs1AR1)[[1]] # shows the correlation structure for each  
Rat (all 16 are the same, since each Rat is measured on the same  
occasions)

# a revised version of what you proposed (which, if I've got this  
right, proves about 25% faster for this dataset, and is simpler to  
set up):

smat <- Matrix(band(solve(corMatrix(cs1AR1)[[1]]), -1, 1), sparse=T)
rownames(smat) <- unique(BodyWeight$Time)
BodyWeight$Time2 <- BodyWeight$Time # distinguishing between two  
different "Time" variables seems to prevent R from crashing when  
calling MCMCglmm...

prior=list(R=list(V=1, nu=0.02), G=list(G1=list(V=diag(2), nu=2,  
alpha.mu=c(0,0), alpha.V=diag(2)*1000), G2=list(V=1, nu=1,  
alpha.mu=0, alpha.V=1000)))

MC0 <- MCMCglmm(weight ~ as.numeric(Time), random= ~ us(1+Time):Rat  
+ Time, data=BodyWeight, prior=prior)
MC1 <- MCMCglmm(weight ~ as.numeric(Time), random= ~ us(1+Time):Rat  
+ Time2, ginverse=list(Time2=smat), data=BodyWeight, prior=prior)
# this approach takes advantage of the fact that the correlation  
structure is the same for each Rat
# it generates the same results as your m3.mcmc, though it  
reallocates some of the variance from your "timeRat" level to the  
units level
# but I think that's not a problem, because the units/residual  
variance can still be identified using va*mat+vb*I
# make sense?

# anyway, here's the (maybe) interesting bit, combining time-series  
and spatial autocorrelation:

BodyWeight$x <- sample(1:length(unique(BodyWeight$Rat)))[BodyWeight$Rat]
BodyWeight$y <- sample(1:length(unique(BodyWeight$Rat)))[BodyWeight$Rat]
plot(unique(BodyWeight$x), unique(BodyWeight$y)) # a map

cE <- corExp(form = ~ x + y | Time)
cE <- Initialize(cE, BodyWeight)
corMatrix(cE)[[1]] # a spatial correlation structure (albeit for  
data without any real spatial pattern)

smat2 <- Matrix(solve(corMatrix(cE)[[1]]), sparse=T) # not actually  
sparse, unfortunately
rownames(smat2) <- unique(BodyWeight$Rat)
BodyWeight$Rat2 <- BodyWeight$Rat # again, this seems necessary to  
prevent R from crashing when calling MCMCglmm

MC2 <- MCMCglmm(weight ~ as.numeric(Time), random= ~ us(1+Time):Rat  
+ Time2, ginverse=list(Time2=smat, Rat2=smat2), data=BodyWeight,  
prior=prior)

From the code MC2 *looks like* it's taking spatial correlation into  
account... but is it? I'm not sure whether I've got this right, and  
I'm a bit perplexed about how one might go about simulating data to  
test this (obviously the data here have no spatial autocorrelation).

Basically, is this an AR1 + spatial corExp model? Your  
clarifications are always appreciated.

Thanks again,
Malcolm




On 28 Jul 2011, at 10:20, Jarrod Hadfield wrote:

Hi Malcolm,

I don't think it's as fancy as you think/hope. The covariance  
structure has to be seperable into a matrix of prameters (V) and a  
fixed matrix (A), whereas in many spatial/time-series models A is  
also a function of some parameter(s). It is flexible in the sense  
that A can be any positive-definite matrix.

I've edited your code below to get MCMCglmm to fit the AR1 model  
(with fixed phi).

Cheers,

Jarrod


On 27 Jul 2011, at 19:18, Malcolm Fairbrother wrote:

Dear Jarrod,

Can I please just confirm what you meant here by "covariance  
structures for the random effects"? If I understand correctly,  
this technique might be very useful for quite a lot of  
things--though I'm also trying to think it through.

Specifically, the nlme package for example allows for error  
structures like AR1, etc., in the *residuals* of a mixed model.  
Are you saying that MCMCglmm's "ginverse" function allows for  
similar sorts of structure to be modelled in the *random effects*?  
Consider:

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.