Skip to content

[R-meta] Interpretation of the Q-test statistic in a multilevel meta-analysis

3 messages · Wolfgang Viechtbauer, Prof. Dr. Martin Brunner

#
Dear List Members,
We employed the rma.mv function from the metafor package to perform a 
meta-analysis where effect sizes were nested within samples, and samples 
were nested within countries. The total number of effect sizes exceeded 
8,000. Below, I provide a toy example, in which I randomly sampled 626 
effect sizes from 351 samples across 87 countries.
We specified a variance-covariance matrix (vcov_mat) to account for the 
observed effect sizes within each sample. The corresponding code was as 
follows:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = list(~ 1 
| COUNTRY / SampleID / ESID), sparse = FALSE)

Here are the results:
Multivariate Meta-Analysis Model (k = 626; method: REML)

     logLik  ? Deviance? ? ? ?  AIC? ? ? ?  BIC? ? ? ? AICc?
   728.1443  -1456.2886  -1448.2886? -1430.5376? -1448.2241?

Variance Components:

       ? ? ? estim? ? sqrt? nlvls? fixed? ? ? ? ? ? ? ?  factor
sigma^2.1  0.0042  0.0648  ?  87? ?  no? ? ? ? ? ? ? ? COUNTRY
sigma^2.2  0.0037  0.0610  ? 351? ?  no? ? ?  COUNTRY/SampleID
sigma^2.3  0.0021  0.0459  ? 626? ?  no? COUNTRY/SampleID/ESID

Test for Heterogeneity:
Q(df = 625) = 23584.2025, p-val < .0001

Model Results:

estimate      se? ? ? zval? ? pval? ? ci.lb? ? ci.ub? ? ?
  -0.2620  0.0085  -30.7263  <.0001? -0.2788? -0.2453? ***

In addition to I? and the variance components at various levels (effect 
sizes, samples, and countries), we used the Q-test statistic to assess the 
heterogeneity of effect sizes.
An expert reviewer of our meta-analysis pointed out potential ambiguities in 
how we interpreted the Q-test statistic. Specifically, the reviewer said 
that the Q-test statistic is "the test of the between-clusters variation 
(whatever the clusters are in the model)."
However, I am unsure how to apply this interpretation to the Q-test 
statistic included in the metafor output. I learned from the help section of 
the rma.mv function that the Q "is the generalized/weighted least squares 
extension of Cochran's Q-test, which tests whether the variability in the 
observed effect sizes or outcomes is larger than one would expect based on 
sampling variability (and the given covariances among the sampling errors) 
alone. A significant test suggests that the true effects/outcomes are 
heterogeneous."
In our case, the Q suggests that the observed effect sizes vary 
significantly (p < .0001) around the average effect size (r = -0.26). 
Furthermore, the Q provided by metafor points to statistically significant 
heterogeneity, with heterogeneity referring to the total variance 
encompassing all potential sources of variance, including effect sizes, 
samples, and countries. However, I am unsure whether this is what the 
reviewer meant by interpreting the Q as "between-clusters variation."
I would highly appreciate any help in clarifying the interpretation of the 
Q-test statistic.
Thank you!
Best regards,
Martin

PS: I apologize for the poor formatting of the metafor output, but my email 
program does not support better formatting options
.
#
Dear Martin,

First of all: Over 8,000 effect sizes?!? Wow, you might be breaking some kind of record there.

A sidenote: Given the model below, I would suspect that 'sparse=TRUE' would help to speed up model fitting.

Now for your actual question: No, the Q-test does not test for "between-clusters variation" (at least not in the sense that it tests for variation between the units of the highest level in the multilevel structure, which seems to be what the reviewer is implying). The docs, which you read (thanks!), correct spell out what the Q-test is testing. In essence, it is testing the given model against one without any random effects. In your case, this would be:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = ~ 1 | COUNTRY / SampleID / ESID)
M0 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat)
anova(M0, M1)

except that this will give you a likelihood ratio test of the random effects, while the Q-test is comparing M0 against a model where every effect size is allowed to have its own fixed effect. So the test statistics are not the same, but conceptually, the two approaches are comparable.

If you want to test for between-country variation, then one can do a LRT comparing model M1 above against one where the country-level variance component is constrained to 0:

M0a <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = ~ 1 | COUNTRY / SampleID / ESID, sigma2=c(0,NA,NA))
anova(M0a, M1)

Model M0a assumes that there is no between-country variation, but it does allow for between-sample (within country) variation and between-effect-size (within sample) variation. So this is quite different than what the Q-test does (and hence the comparison between M0 and M1).

I hope this clarifies things.

Best,
Wolfgang
#
Dear Wolfgang,
thank you so much for this enlightening clarification and the further 
suggestions to test key assumptions of our model.
Best,
Martin

On Mi, 11 Sep 2024 10:51:42 +0000
  Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer at maastrichtuniversity.nl> 
wrote: