Recovering correlations

Mon, Oct 11, 2021 4:27 AM
I'm trying to recover the correlations in a multivariate dataset. Let me
first start with a trivariate case.

require(MASS); require(lme4)

# trivariate case

r1 <- 0.5               # correlation value to be recovered

ns <- 200               # number of samples

S1 <- matrix(c(1,r1,0,  # correlation structure of trivariate data

               r1,1,0,

               0,0,1), nrow=3, ncol=3)

# simulated trivariate data

dat <- data.frame(f = c(rep(paste0('P',1:ns), 2), paste0('S',1:ns)),

                  y = c(mvrnorm(n=ns, mu=c(0, 0, 0), Sigma=S1)))


Now I can construct the following model


m1 <- lmer(y ~ 1 + (1|f), data=dat)


Then with the variances from the random effects in summary(m1)


summary(m1)


Random effects:

 Groups   Name        Variance Std.Dev.

 f        (Intercept) 0.4996   0.7068

 Residual             0.4907   0.7005


I can recover the correlation r1:


tmp <- unlist(lapply(VarCorr(m1), `[`, 1))

# recover the correlation r1

tmp/(tmp+attr(VarCorr(m1), "sc")^2)


Let's switch to a pentavariate case:

# pentavariate case

r1 <- 0.2; r2 <- 0.8        # correlation value to be recovered

ns <- 200                   # number of samples

S  <- matrix(c(1,r1,0,0,0,  # correlation structure of pentavariate data

               r1,1,0,0,0,  # the first and second variables are correlated

               0,0,1,r2,0,  # the third and fourth variables are correlated

               0,0,r2,1,0,

               0,0,0,0,1), nrow=5,ncol=5)

# simulated data

dat <- data.frame(f = c(rep(paste0('P',1:ns), 2), rep(paste0('T',1:ns), 2),
paste0('S',1:ns)),

                  R=c(rep('P',2*ns), rep('T',2*ns), rep('S', ns)),

                  y = c(mvrnorm(n=ns, mu=rep(0,5), Sigma=S)))

dat$R1 <- as.numeric(dat$R=='P')   # dummy code the first and second
variables (r1)

dat$R2 <- as.numeric(dat$R=='T')   # dummy code the third and fourth
variables (r2)

I have thought about the following models

m2 <- lmer(y ~ 1 + (0+R1|f) + (0+R2|f), data=dat)

m3 <- lmer(y ~ 1 + (1|f) + (0+R2|f), data=dat)

but I've been struggling to figure out a way to recover the correlations r1
and r2 with the variances from the random effects based on the models m2
and m3.

Of course I could use a workaround solution by reducing the
pentavariate data into two trivariate cases:

# workaround solution

m4 <- lmer(y ~ 1 + (1|f), data=dat[dat$R2!=1,]) # recover r1 like model m1
above

m5 <- lmer(y ~ 1 + (1|f), data=dat[dat$R1!=1,]) # recover r2 like model m1
above

However, I would really like to find a way to recover r1 and r2 directly
using models like m2 and m3. Suggestions?

Thanks,
Gang Chen