-----Original Message-----
From: R-sig-meta-analysis <r-sig-meta-analysis-bounces at r-project.org> On Behalf
Of St Pourcain, Beate via R-sig-meta-analysis
Sent: Friday, January 26, 2024 19:14
To: r-sig-meta-analysis at r-project.org
Cc: St Pourcain, Beate <Beate.StPourcain at mpi.nl>
Subject: [R-meta] rma.mv metafor models for genome-wide meta-analyses are (too)
conservative
Hi,
We are a group of geneticists using meta-regression for genome-wide meta-
analysis and encountered a hidden thorny issue.
We use metafor rma.mv models to meta-analyse:
* 130 correlated input statistics
* each statistics has BETAs for 8 million variants (reflecting genetic
association effects)
capturing:
* 70000 individuals
* 400000 repeat observations
from
23 cohorts.
Across the 8 million variants, the following model fits quite well for each
variant, based on a (fairly) well-known phenotypic correlation matrix for sample
overlap (Vsampoverlap) scaled to the SE of each BETA.
model_cov <- rma.mv(yi = BETA, V = Vsampoverlap, mods = COV1+ COV2+COV3, random=
list(~ 1|COHORT/VAR), data = df)}, silent = F)
where VAR specifies the input statistics, COHORT represents cohorts, and COV
represents fixed effects.
However, when predicting BETAs for each of the 8 million variants across a grid
of fixed effect predictors (COV1, COV2, COV3) we note that, on average, derived
genome-wide Z-scores based on predicted BETAs and SEs are too conservative and
deviate from the expected null distribution in a quantile-quantile plot,
affecting all predictions. We can also quantify this deviation from a null
distribution across all 8 million variants using the LDSC intercept (i.e. the
Linkage disequilibrium Score regression intercept for a variant-based
heritability estimation; Bullik-Sullivan 2015:
https://pubmed.ncbi.nlm.nih.gov/25642630/), which should be one if unbiased, but
we observe ~0.9(SE=0.005). Note that the intercept would be above one, if test
statistics were inflated.
Thus, we think we subtly over-correct for relatedness and quench the power of
our analysis. Would there be any thoughts within the group as to how to relax
the adjustment for relatedness in metafor?
Thanks so much for any input,
Beate
PS: Note to random geneticists in the group: If we could, we would replace the
phenotypic correlation matrix with the non-genetic part of the phenotypic
correlation matrix, but such an estimation is unreliable given the small sample
size of most cohorts.
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1]
LC_COLLATE=English_Europe.1252 LC_CTYPE=English_Europe.1252 LC_MONETARY=Engl
ish_Europe.1252 LC_NUMERIC=C
[5] LC_TIME=English_Europe.1252
attached base packages:
[1] parallel stats graphics grDevices
utils datasets methods base
other attached packages:
[1] data.table_1.14.2 metafor_4.4-0 numDeriv_2016.8-1.1 metadat_1.2-
0 Matrix_1.4-0
Beate St Pourcain, PhD
Senior Investigator & Group Leader
Room A207
Max Planck Institute for Psycholinguistics | Wundtlaan 1 | 6525 XD Nijmegen |
The Netherlands
@bstpourcain
Tel: tel:+31%2024%20352%201964
Fax: tel:+31%2024%20352%201213
ORCID: https://orcid.org/0000-0002-4680-3517
Web: https://www.mpi.nl/departments/language-and-genetics/projects/population-
variation-and-human-communication/
Further affiliations with:
MRC Integrative Epidemiology Unit | University of Bristol | UK
Donders Institute for Brain, Cognition and Behaviour | Radboud University | The
Netherlands