Skip to content
Prev 5079 / 5636 Next

[R-meta] rma.mv metafor models for genome-wide meta-analyses are (too) conservative

Dear Beate,

Just so I understand this correctly:

1) You have 130 estimates (of something) within each of 23 cohorts, so the input to yi in the model below is of length 130*23=2990, correct?

2) And Vsampoverlap is therefore a 2990x2990 variance-covariance matrix, which is block-diagonal with 23 130*130 blocks, correct?

3) Then you use the results from this model to compute 8 million predicted values (for various combinations of COV1, COV2, and COV3) and compute 'z = pred / se', correct?

4) Then you examine the distribution of these 8 million z-values and find that they are too conservative (i.e., they are not quite standard normal, but have an SD that is somewhat larger than 1 / more than 5% of the z-values are >= +-1.96), correct?

I hope I got all of this correct. I am not familiar with the specifics of this application, but here are some thoughts.

First of all, it is not entirely clear to me why you would expect the predicted values to be standard normal in the first place. Maybe this is similar to GWAS (in a single sample) where one would expect the vast majority of SNPs not to be associated with some phenotype of interest so the p-values should form a roughly uniform distribution, but this may not be true when there is population stratification (or other things) going on and one can use the distribution of p-values to check for this.

You also wrote that the correlation matrix is 'scaled to the SE of each BETA'. So does the diagonal of Vsampoverlap correspond to the standard errors of the estimates or their sampling variances (squared standard errors)? It should be the latter.

The random effects structure of the model is quite simple, assuming a constant correlation for every pair of estimates and a constant amount of heterogeneity for the various estimates. These may be fairly sensible or rather strict assumptions, depending on the application. Relaxing these assumptions is tricky though, because with 130 different estimates, even allowing the amount of heterogeneity to differ would introduce 129 additional parameters into the model and let's not even think about a fully unstructured var-cov matrix for the random effects.

An interesting attempt might be to try:

res <- robust(model_cov, cluster=COHORT)

(or possibly also adding clubSandwich=TRUE) and then using 'res' for obtaining the predicted values, but I am not sure how well cluster-robust inferences methods will handle 130 estimates with 'only' 23 cohorts. The idea is that even if Vsampoverlap and/or the random effects structure is misspecified, cluster-robust inferences methods should give you asymptotically correct standard errors / tests. But I somewhat doubt that this works with so many estimates relative to the number of cohorts. Still worth a try though.

Maybe some of these comments will be helpful or at least spark a bit more discussion.

Best,
Wolfgang