Skip to content
Prev 5448 / 5632 Next

[R-meta] Meta-analysis of intra class correlation coefficients

It is not ideal to meta-analyze raw correlations (or raw ICC values) if they are so large. In this case, I think the r-to-z transformation is highly advisable.

What you observe is the fact that the variance of a raw correlation coefficient depends on the true correlation. In particular, the large-sample estimate of the variance of a raw correlation coefficient (or ICC based on pairs) is (1-r^2)^2 / (n-1), where r is the observed correlation and n the sample size. Therefore, as r gets close to 1, the variance will get small, as you noted.

It is therefore no surprise that the Egger regression test is highly significant. Of course, this then says nothing about potential publication bias. There is an inherent link between the correlation and its varianace (and hence standard error). The same issue also arises with other outcome measures / effect sizes (e.g., standardized mean differences).

In the present case, the r-to-z transformation 'solves' this issue, since Var[z_r] =~ 1/(n-3) for Pearson correlations and Var[z_ICC] =~ 1/(n-3/2) for ICC(1) values.

I would then consider doing a bivariate meta-analysis of the z_ICC_mz and z_ICC_dz values. Since they are based on independent samples, their sampling errors are uncorrelated, but the random effects of the bivariate model then account for potential correlation in the underlying true (transformed) ICC values. This is analogous to what people do when pooling sensitivity and specificity values in a diagnostic test meta-analysis and also directly relates to the bivariate model discussed by van Houwelingen et al. (2002):

https://www.metafor-project.org/doku.php/analyses:vanhouwelingen2002

You can then estimate the heritability from this model by back-transforming the pooled estimate for the MZ twins and the pooled estimate from the DZ twins and taking twice the difference. The SE and hence CI for this can then be obtained via the delta method.

This raises interesting questions about the difference between between 'pooled(x) - pooled(y)' versus 'pooled(x - y)' -- there are papers in the literature that discuss this issue (not in the present context) -- but the latter option doesn't appear sensible to me here anyway.

Best,
Wolfgang