[R-meta] Var-cov structure in multilevel/multivariate meta-analysis

Fri, Mar 22, 2019 3:41 PM

Hi Wolfgang and mailing list,

I would like to follow up on point (3) from this old thread. Quick
refresher of the data structure: We have (approximately) 650 effects from
400 treatment-control comparisons, which come from 325 independent samples,
nested in 275 studies from 200 papers. Many of the samples include more
than one treatment-control comparison and evaluate their effect on more
than one outcome measure, resulting in correlated residuals clustered at
the level of independent samples.

First, we model the hierarchical dependence of the true effects. LRTs
indicate that model fit is improved significantly by adding random effects
for treatment-control comparisons (relative to a single-level model) and
further improved by adding random effects for papers (relative to the
two-level model). Adding random effects for samples and studies did not
improve on the two-level model. So in short, we include random effects for
papers, treatment-control comparisons, and individual estimates, skipping
studies and samples. Second, to deal with the nonindependent residuals, due
to multiple comparisons and multiple outcome measures, we use
cluster-robust variance estimation. In our data, residuals are clustered at
the level of independent samples.

As such, we could fit a model as follows:
vcv <- clubSandwich::impute_covariance_matrix(vi = data$vi, cluster =
data$sample_id, r = 0.7)
m <- metafor::rma.mv(yi, V = vcv, random = ~ 1 | paper_id/comp_id/es_id)
clubSandwich::coef_test(m, cluster = data$sample_id, vcov = "CR2")

Are there any problems with computing robust standard errors at a level of
clustering (here: samples) that does not correspond to the levels at which
hierarchical dependence of the true effects are modeled (here: papers and
treatment-control comparisons)? If so, what would be a better approach?

Many thanks!
Fabian


On Fri, Oct 5, 2018 at 11:37 AM Viechtbauer, Wolfgang (SP) <

wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Hi Fabian,

(very clear description of the structure, thanks!)

1) Your approach sounds sensible.

2) If you are going to use cluster-robust inference methods in the end
anyway, then getting the var-cov matrix of the sampling errors 'exactly
right' is probably not crucial. It can be a huge pain constructing the
var-cov matrix, especially when dealing with complex data structures as you
describe. So, sticking to the "best guess" approach is probably defensible.

3) It is difficult to give general advice, but it is certainly possible to
add random effects for samples, studies, and papers (plus random effects
for the individual estimates) here. One can probably skip a level if the
number of units at a particular level is not much higher than the number of
units at the next level (the two variance components are then hard to
distinguish). So, for example, 200 studies in 180 papers is quite similar,
so one could probably leave out the studies level and only add random
effects for papers (plus for samples and the individual estimates). You can
also run likelihood ratio tests to compare models to see if adding random
effects at the studies level actually improves the model fit significantly.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:
r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Fabian Schellhaas
Sent: Thursday, 27 September, 2018 23:55
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Var-cov structure in multilevel/multivariate
meta-analysis

Dear all,

My meta-analytic database consists of 350+ effect size estimates, drawn
from 240+ samples, which in turn were drawn from 200+ studies, reported in
180+ papers. Papers report results from 1-3 studies each, studies report
results from 1-2 samples each, and samples contribute 1-6 effect sizes
each. Multiple effects per sample are possible due to (a) multiple
comparisons, such that more than one treatment is compared to the same
control group, (b) multiple outcomes, such that more than one outcome is
measured within the same sample, or (c) both. We coded for a number of
potential moderators, which vary between samples, within samples, or both.
I included an example of the data below.

There are two main sources of non-independence: First, there is
hierarchical dependence of the true effects, insofar as effects nested in
the same sample (and possibly those nested in the same study and paper) are
correlated. Second, there is dependence arising from correlated sampling
errors when effect-size estimates are drawn from the same set of
respondents. This is the case whenever a sample contributes more than one
effect, i.e. when there are multiple treatments and/or multiple outcomes.

To model these data, I start by constructing a ?best guess? of the var-cov
matrices following James Pustejovsky's approach (e.g.,
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2017-August/000094.html
),
treating samples in my database as independent clusters. Then, I use these
var-cov matrices to construct the multilevel/multivariate meta-analytic
model. To account for the misspecification of the var-cov structure, I
perform all coefficient and moderator tests using cluster-robust variance
estimation. This general approach has also been recommended on this mailing
list and allows me (I think) to use all available data, test all my
moderators, and estimate all parameters with an acceptable degree of
precision.

My questions:

1. Is this approach advisable, given the nature of my data? Any problems I
missed?

2. Most manuscripts don?t report the correlations between multiple
outcomes, thus preventing the precise calculation of covariances for this
type of dependent effect size. By contrast, it appears to be fairly
straightforward to calculate the covariances between multiple-treatment
effects (i.e., those sharing a control group), as per Gleser and Olkin
(2009). Given my data, is there a practical way to construct the var-cov
matrices using a combination of ?best guesses? (when correlations cannot be
computed) and precise computations (when they can be computed via Gleser
and Olkin)? I should note that I?d be happy to just stick with the ?best
guess? approach entirely, but as Wolfgang Viechtbauer pointed out (
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2017-August/000131.html
),
only a better approximation of the var-cov structure can improve precision
of the fixed-effects estimates. That's why I'm exploring this option.

3. How would I best determine for which hierarchical levels to specify
random effects? I certainly expect the true effects within the same set of
respondents to be correlated, so would at least add a random effect for
sample. Beyond that (i.e., study, paper, and so forth) I?m not so sure.

Cheers,

Fabian

### Database example:

Paper 1 contributes two studies - one containing just one sample, the other
containing two samples ? evaluating the effect of treatment vs. control on
one outcome. Paper 2 contributes one study containing one sample,
evaluating the effect of two treatments (relative to the same control) on
two separate outcomes each. The first moderator varies between samples, the
second moderator varies both between and within samples.

paper     study sample    comp es yi        vi mod1 mod2

1         1 1         1 1 0.x       0.x A A

1         2 2         2 2 0.x       0.x B B

1         2 3         3 3 0.x       0.x A B

2         3 4         4 4 0.x       0.x B A

2         3 4         4 5 0.x       0.x B C

2         3 4         5 6 0.x       0.x B A

2         3 4         5 7 0.x       0.x B C

---
Fabian Schellhaas | Ph.D. Candidate | Department of Psychology | Yale
University

[R-meta] Var-cov structure in multilevel/multivariate meta-analysis

Thread (10 messages)