[R-meta] Interpretation of the Q-test statistic in a multilevel meta-analysis

3 messages · Wolfgang Viechtbauer, Prof. Dr. Martin Brunner

Wed, Sep 11, 2024 1:22 AM #

Dear List Members,
We employed the rma.mv function from the metafor package to perform a 
meta-analysis where effect sizes were nested within samples, and samples 
were nested within countries. The total number of effect sizes exceeded 
8,000. Below, I provide a toy example, in which I randomly sampled 626 
effect sizes from 351 samples across 87 countries.
We specified a variance-covariance matrix (vcov_mat) to account for the 
observed effect sizes within each sample. The corresponding code was as 
follows:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = list(~ 1 
| COUNTRY / SampleID / ESID), sparse = FALSE)

Here are the results:
Multivariate Meta-Analysis Model (k = 626; method: REML)

     logLik  ? Deviance? ? ? ?  AIC? ? ? ?  BIC? ? ? ? AICc?
   728.1443  -1456.2886  -1448.2886? -1430.5376? -1448.2241?

Variance Components:

       ? ? ? estim? ? sqrt? nlvls? fixed? ? ? ? ? ? ? ?  factor
sigma^2.1  0.0042  0.0648  ?  87? ?  no? ? ? ? ? ? ? ? COUNTRY
sigma^2.2  0.0037  0.0610  ? 351? ?  no? ? ?  COUNTRY/SampleID
sigma^2.3  0.0021  0.0459  ? 626? ?  no? COUNTRY/SampleID/ESID

Test for Heterogeneity:
Q(df = 625) = 23584.2025, p-val < .0001

Model Results:

estimate      se? ? ? zval? ? pval? ? ci.lb? ? ci.ub? ? ?
  -0.2620  0.0085  -30.7263  <.0001? -0.2788? -0.2453? ***

In addition to I? and the variance components at various levels (effect 
sizes, samples, and countries), we used the Q-test statistic to assess the 
heterogeneity of effect sizes.
An expert reviewer of our meta-analysis pointed out potential ambiguities in 
how we interpreted the Q-test statistic. Specifically, the reviewer said 
that the Q-test statistic is "the test of the between-clusters variation 
(whatever the clusters are in the model)."
However, I am unsure how to apply this interpretation to the Q-test 
statistic included in the metafor output. I learned from the help section of 
the rma.mv function that the Q "is the generalized/weighted least squares 
extension of Cochran's Q-test, which tests whether the variability in the 
observed effect sizes or outcomes is larger than one would expect based on 
sampling variability (and the given covariances among the sampling errors) 
alone. A significant test suggests that the true effects/outcomes are 
heterogeneous."
In our case, the Q suggests that the observed effect sizes vary 
significantly (p < .0001) around the average effect size (r = -0.26). 
Furthermore, the Q provided by metafor points to statistically significant 
heterogeneity, with heterogeneity referring to the total variance 
encompassing all potential sources of variance, including effect sizes, 
samples, and countries. However, I am unsure whether this is what the 
reviewer meant by interpreting the Q as "between-clusters variation."
I would highly appreciate any help in clarifying the interpretation of the 
Q-test statistic.
Thank you!
Best regards,
Martin

PS: I apologize for the poor formatting of the metafor output, but my email 
program does not support better formatting options
.

Wolfgang Viechtbauer

Wed, Sep 11, 2024 3:51 AM #

Dear Martin,

First of all: Over 8,000 effect sizes?!? Wow, you might be breaking some kind of record there.

A sidenote: Given the model below, I would suspect that 'sparse=TRUE' would help to speed up model fitting.

Now for your actual question: No, the Q-test does not test for "between-clusters variation" (at least not in the sense that it tests for variation between the units of the highest level in the multilevel structure, which seems to be what the reviewer is implying). The docs, which you read (thanks!), correct spell out what the Q-test is testing. In essence, it is testing the given model against one without any random effects. In your case, this would be:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = ~ 1 | COUNTRY / SampleID / ESID)
M0 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat)
anova(M0, M1)

except that this will give you a likelihood ratio test of the random effects, while the Q-test is comparing M0 against a model where every effect size is allowed to have its own fixed effect. So the test statistics are not the same, but conceptually, the two approaches are comparable.

If you want to test for between-country variation, then one can do a LRT comparing model M1 above against one where the country-level variance component is constrained to 0:

M0a <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = ~ 1 | COUNTRY / SampleID / ESID, sigma2=c(0,NA,NA))
anova(M0a, M1)

Model M0a assumes that there is no between-country variation, but it does allow for between-sample (within country) variation and between-effect-size (within sample) variation. So this is quite different than what the Q-test does (and hence the comparison between M0 and M1).

I hope this clarifies things.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis <r-sig-meta-analysis-bounces at r-project.org> On Behalf
Of Martin Brunner via R-sig-meta-analysis
Sent: Wednesday, September 11, 2024 10:23
To: r-sig-meta-analysis at r-project.org
Cc: Martin Brunner <martin.brunner at uni-potsdam.de>
Subject: [R-meta] Interpretation of the Q-test statistic in a multilevel meta-
analysis

Dear List Members,
We employed the rma.mv function from the metafor package to perform a
meta-analysis where effect sizes were nested within samples, and samples
were nested within countries. The total number of effect sizes exceeded
8,000. Below, I provide a toy example, in which I randomly sampled 626
effect sizes from 351 samples across 87 countries.
We specified a variance-covariance matrix (vcov_mat) to account for the
observed effect sizes within each sample. The corresponding code was as
follows:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = list(~ 1
| COUNTRY / SampleID / ESID), sparse = FALSE)

Here are the results:
Multivariate Meta-Analysis Model (k = 626; method: REML)

     logLik    Deviance         AIC         BIC        AICc
   728.1443  -1456.2886  -1448.2886  -1430.5376  -1448.2241

Variance Components:

             estim    sqrt  nlvls  fixed                 factor
sigma^2.1  0.0042  0.0648     87     no                COUNTRY
sigma^2.2  0.0037  0.0610    351     no       COUNTRY/SampleID
sigma^2.3  0.0021  0.0459    626     no  COUNTRY/SampleID/ESID

Test for Heterogeneity:
Q(df = 625) = 23584.2025, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub
  -0.2620  0.0085  -30.7263  <.0001  -0.2788  -0.2453  ***

In addition to I? and the variance components at various levels (effect
sizes, samples, and countries), we used the Q-test statistic to assess the
heterogeneity of effect sizes.
An expert reviewer of our meta-analysis pointed out potential ambiguities in
how we interpreted the Q-test statistic. Specifically, the reviewer said
that the Q-test statistic is "the test of the between-clusters variation
(whatever the clusters are in the model)."
However, I am unsure how to apply this interpretation to the Q-test
statistic included in the metafor output. I learned from the help section of
the rma.mv function that the Q "is the generalized/weighted least squares
extension of Cochran's Q-test, which tests whether the variability in the
observed effect sizes or outcomes is larger than one would expect based on
sampling variability (and the given covariances among the sampling errors)
alone. A significant test suggests that the true effects/outcomes are
heterogeneous."
In our case, the Q suggests that the observed effect sizes vary
significantly (p < .0001) around the average effect size (r = -0.26).
Furthermore, the Q provided by metafor points to statistically significant
heterogeneity, with heterogeneity referring to the total variance
encompassing all potential sources of variance, including effect sizes,
samples, and countries. However, I am unsure whether this is what the
reviewer meant by interpreting the Q as "between-clusters variation."
I would highly appreciate any help in clarifying the interpretation of the
Q-test statistic.
Thank you!
Best regards,
Martin

PS: I apologize for the poor formatting of the metafor output, but my email
program does not support better formatting options.

Prof. Dr. Martin Brunner

Wed, Sep 11, 2024 4:39 AM #

Dear Wolfgang,
thank you so much for this enlightening clarification and the further 
suggestions to test key assumptions of our model.
Best,
Martin

On Mi, 11 Sep 2024 10:51:42 +0000
  Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer at maastrichtuniversity.nl> 
wrote:

Dear Martin,

First of all: Over 8,000 effect sizes?!? Wow, you might be breaking 
some kind of record there.

A sidenote: Given the model below, I would suspect that 
'sparse=TRUE' would help to speed up model fitting.

Now for your actual question: No, the Q-test does not test for 
"between-clusters variation" (at least not in the sense that it tests 
for variation between the units of the highest level in the 
multilevel structure, which seems to be what the reviewer is 
implying). The docs, which you read (thanks!), correct spell out what 
the Q-test is testing. In essence, it is testing the given model 
against one without any random effects. In your case, this would be:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = ~ 
1 | COUNTRY / SampleID / ESID)
M0 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat)
anova(M0, M1)

except that this will give you a likelihood ratio test of the random 
effects, while the Q-test is comparing M0 against a model where every 
effect size is allowed to have its own fixed effect. So the test 
statistics are not the same, but conceptually, the two approaches are 
comparable.

If you want to test for between-country variation, then one can do a 
LRT comparing model M1 above against one where the country-level 
variance component is constrained to 0:

M0a <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = 
~ 1 | COUNTRY / SampleID / ESID, sigma2=c(0,NA,NA))
anova(M0a, M1)

Model M0a assumes that there is no between-country variation, but it 
does allow for between-sample (within country) variation and 
between-effect-size (within sample) variation. So this is quite 
different than what the Q-test does (and hence the comparison between 
M0 and M1).

I hope this clarifies things.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis 
<r-sig-meta-analysis-bounces at r-project.org> On Behalf
Of Martin Brunner via R-sig-meta-analysis
Sent: Wednesday, September 11, 2024 10:23
To: r-sig-meta-analysis at r-project.org
Cc: Martin Brunner <martin.brunner at uni-potsdam.de>
Subject: [R-meta] Interpretation of the Q-test statistic in a 
multilevel meta-
analysis

Dear List Members,
We employed the rma.mv function from the metafor package to perform 
a
meta-analysis where effect sizes were nested within samples, and 
samples
were nested within countries. The total number of effect sizes 
exceeded
8,000. Below, I provide a toy example, in which I randomly sampled 
626
effect sizes from 351 samples across 87 countries.
We specified a variance-covariance matrix (vcov_mat) to account for 
the
observed effect sizes within each sample. The corresponding code was 
as
follows:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = 
list(~ 1
| COUNTRY / SampleID / ESID), sparse = FALSE)

Here are the results:
Multivariate Meta-Analysis Model (k = 626; method: REML)

     logLik    Deviance         AIC         BIC        AICc
   728.1443  -1456.2886  -1448.2886  -1430.5376  -1448.2241

Variance Components:

             estim    sqrt  nlvls  fixed                 factor
sigma^2.1  0.0042  0.0648     87     no                COUNTRY
sigma^2.2  0.0037  0.0610    351     no       COUNTRY/SampleID
sigma^2.3  0.0021  0.0459    626     no  COUNTRY/SampleID/ESID

Test for Heterogeneity:
Q(df = 625) = 23584.2025, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub
  -0.2620  0.0085  -30.7263  <.0001  -0.2788  -0.2453  ***

In addition to I? and the variance components at various levels 
(effect
sizes, samples, and countries), we used the Q-test statistic to 
assess the
heterogeneity of effect sizes.
An expert reviewer of our meta-analysis pointed out potential 
ambiguities in
how we interpreted the Q-test statistic. Specifically, the reviewer 
said
that the Q-test statistic is "the test of the between-clusters 
variation
(whatever the clusters are in the model)."
However, I am unsure how to apply this interpretation to the Q-test
statistic included in the metafor output. I learned from the help 
section of
the rma.mv function that the Q "is the generalized/weighted least 
squares
extension of Cochran's Q-test, which tests whether the variability 
in the
observed effect sizes or outcomes is larger than one would expect 
based on
sampling variability (and the given covariances among the sampling 
errors)
alone. A significant test suggests that the true effects/outcomes 
are
heterogeneous."
In our case, the Q suggests that the observed effect sizes vary
significantly (p < .0001) around the average effect size (r = 
-0.26).
Furthermore, the Q provided by metafor points to statistically 
significant
heterogeneity, with heterogeneity referring to the total variance
encompassing all potential sources of variance, including effect 
sizes,
samples, and countries. However, I am unsure whether this is what 
the
reviewer meant by interpreting the Q as "between-clusters 
variation."
I would highly appreciate any help in clarifying the interpretation 
of the
Q-test statistic.
Thank you!
Best regards,
Martin

PS: I apologize for the poor formatting of the metafor output, but 
my email
program does not support better formatting options.