Skip to content
Prev 4030 / 5632 Next

[R-meta] confusion point: the various 'correlation' (rho, ρ) in multivariate meta-analytic model

Dear Yefang,

Please be careful with using specialized symbols/formatting in your text, since this is a plain-text mailing list and such symbols/formatting might not display correctly for the recipients (note the ? mark symbols below and in how this ends up looking in the archives: https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2022-May/004026.html). 

Also, when I get questions via email, I typically redirect them here or places like StackOverflow (https://stackoverflow.com) or CrossValidated (https://stats.stackexchange.com) because the answer I or others provide might be beneficial to more people, not just the person asking. And yes, sometimes I do not have the time to answer.

So, to your question(s):

The V matrix (which we can construct/approximate with vcalc()) is the variance-covariance matrix of the sampling errors of the effect size estimates within each study. To keep things simple, say our 'effect size measure' is simply a raw mean. And we have measured two different things (like cognition and anxiety) in a single group of subjects. So, the raw data in a study would simply be two columns, one for each variable, for the n subjects. Now imagine we would repeat this study over and over, each time computing the means of the two variables based on new samples of subjects drawn from the same population and then we would correlate these pairs of means -- that is the within-study correlation. But we just have run the study once. So we cannot correlate the pairs of means, since you cannot compute the correlation when you just have two numbers. However, it turns out that the correlation between the raw data (say, r) is an estimate of the correlation between the two means!

This is essentially the same principle as what is used in computing (or rather: estimating) the sampling variance of an effect size measure. In theory, the sampling variance is the variance in the effect size estimates we would obtain if we theoretically would repeat the same study over and over under identical circumstances. But we don't do that, since the study was run once. We have our effect size estimate from that study and now we want to know what its variance would have been if we had repeated the study over and over. For a mean, we can estimate its sampling variance by dividing the variance of the raw data by n. So, we can do this for the cognition mean (variance of the raw cognition values divided by n) and the anxiety mean (variance of the raw anxiety values divided by n). Those are the sampling variances. And the covariance between the two means is estimated by taking the correlation between the raw data and multiplying that by the square root of the product of the two sampling variances (i.e., cov = r * sqrt(sampling_var_1 * sampling_var_2)).

This will give me the 2x2 matrix that goes into the V matrix for this particular study, which reflects the dependency in the two estimates within this study. So, the 'rho' for vcalc() is about this within-study correlation ('r' above).

Sidenote: For other effect size measures that are more commonly used in meta-analyses (instead of simply means) like standardized mean differences, (log) odds/risk ratios, and so on, the equations that need to be used to compute/estimate their sampling variances and the correlation between two estimates computed based on the same sample of subjects are of course different than those used for means.

But in a meta-analaysis, we have multiple studies. And for each study, there is a pair of means (or whatever the effect size measure is). Another type of correlation we can ask about is the correlation in the underlying *true* means (assuming that the true means of the two variables are not constant across studies). This is what we can estimate by fitting a model like:

rma.mv(yi, V, mods = ~ outcome, random = ~ outcome | study, struct="UN", data=mydata)

So, yi is the vector with the means for the two outcomes for all of the studies, V is the var-cov matrix we constructed above (block-diagonal with 2x2 blocks), outcome distinguishes if a value in yi is a mean for the first or the second outcome, and 'study' is a study identifier.

Sidenote: If a study did not measure both outcomes, then this is no problem -- it just provides one value to 'yi' and its part in the V matrix is just a 1x1 block.

By adding a random effect for 'outcome within study' (random = ~ outcome | study) with an 'unstructured variance-covariance matrix' (struct="UN"), we estimate the variance in the *true* means for the first and second outcome (and we allow those two variances to be different) and we estimate the covariance/correlation between the true means. This is the 'rho' for the correlation of the true means.

Note that it is important above that we use rma.mv(yi, V, ...) and not just rma.mv(yi, vi, ...), since the latter would assume that the within-study correlations are 0. The effect of this is (typically) that we would overestimate the correlation of the true means.

And finally, yes, there can be other 'rho' types (or more generally, correlations, because what we call these correlations is completely arbitrary). For example, there can be autocorrelations when the same effect size is repeatedly measured over time in a group of subjects (then we can have within-study autocorrelation and also the autocorrelation in the true means).

I hope this clarifies things a bit.

Best,
Wolfgang