Skip to content

[R-meta] multi-level meta-analysis with dependent effect sizes

3 messages · Andrew Guerin, James Pustejovsky

#
Hi James,


Thanks for the rapid response.


The study is looking at how a preparation technique (acid treatment) affects particular measurements (stable isotope signatures) in sample materials. Experiments compare the isotope signatures of replicates with and without acid treatment - so it is the differences, rather than the absolute values, which are of interest (so of your two suggestions, it would have to be option 1).


The hierarchical dependence arises as you have guessed. All the included studies are explicit tests of acid treatment. While some studies look at the effects on just one type of sample material (say, crab muscle or fish scales)  others might test a selection of materials from various sources. So there can be multiple values of 'sampletype' within a study. The other moderators (A-H) concern potentially influential differences in the experimental methods used (eg. type of acid, method of application, how samples were treated before acid exposure, etc). 'Sampletype' is the main moderator of interest - the idea is to be able to say whether or not acid treatment is desirable/necessary for a particular type of material (an overall estimate of effect size for all types of sample combined is not very interesting). The other moderators are less fundamentally interesting, they are there mainly to account for differences in the methods used in the studies included in the analysis, though of course it is interesting to be able to report whether they make any difference.


In terms of variation in sampletype and the other moderators, you could put the studies into 4 groups:


1) Single sample type, fixed protocol (all moderators A-H constant; 10 studies)

2) Single sample type, multiple protocols (generally one or two moderators will vary, others will remain constant; 2 studies)

3) Multiple sample types, fixed protocol (the most common - 25 studies)

4) Multiple sample types, multiple protocols (such studies are a lot of work, so they are generally limited to 4 or fewer sample types, and a small number of variations in one or two moderators; 7 studies)


So there is a bit of both between-study and within-study variation in the moderators. For the two most common types of study (1 and 3) since there is only one protocol there will be multiple independent treatment/control pairs, so no 'multiple treatments' type dependence (though still potential hierarchical dependence). For type 2 there will be one group of treatments per study, but all compared to a single control. Studies in group 4 will presumably exhibit both types of dependence.


Cheers,

Andrew
Message-ID: <WM!5a342fdf5bc3137e3630b391be93348178dd5bef31b9490f76949977f6328f71494d28e294cbed7c1a3df784de21928c!@mailhub-mx2.ncl.ac.uk>
#
Andrew,

Your description of the data structure is very clear. (Bravo for that! Many
meta-analyses that I have reviewed---even quite recent ones---don't do a
great job of describing the hierarchical/multivariate structure of the
data, even though it's quite important.) Based on your responses, I think
the best thing to do would be to use the Gleser & Olkin approach to try and
approximate the covariances among effect size estimates that share a common
control condition. The reason I think it's best is that you should have all
the information you need in order to calculate the covariances (when the
effect sizes are differences in means, all that is needed are means, SDs,
and Ns per cell), and so there's not a strong rationale for using other
approximations. For example, using impute_covariance_matrix() with r = 0.7
is likely to be a bit too high. Consider an example where two treatment
cells A and B, and the control cell C all have equal sample sizes n, and
the SDs are also all equal to S. Then

Var(A - C) =  S^2 * 2 / n
Var(B - C) = S^2 * 2 / n
Cov(A - C, B - C) = S^2 / n
And so Cor(A - C, B - C) = 0.5, rather than 0.7.

A further reason for going this route is that using good estimates of
sampling covariances is the only way I know of to get defensible,
interpretable estimates of the random effects variance components of the
model. If you use something along the lines of your model D,

nmtD <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A)
... + factor(H), random = ~1|sid/contrast, data=ndat, method="REML")

(but with vcvndat based on Gleser & Olkin), then you would be able to
interpret the random effects vairances on sid and contrast. You noted that
one of the variance components (not sure which one is sigma2.1 versus
sigma2.2) does seem to be sensitive to the assumed r, so Gleser & Olkin
would be a way to pin that down. It would also let you explore other
potential specifications of the random effects structure, such as allowing
variance components to differ across levels of sampletype (or other
moderators).

All that said, computing the full Gleser & Olkin vcov matrix is a bit
inconvenient because you have to work with block diagonal matrices. If you
send a smallish example dataset, I can try to provide some example code for
how to do these calculations.

James

On Tue, Sep 11, 2018 at 10:54 AM Andrew Guerin <
Andrew.Guerin at newcastle.ac.uk> wrote:

            

  
  
9 days later
#
Andrew,

You mentioned that you've figured out how to do this, but I figured that I
would share my approach anyways in case others find it useful.

Cheers,
James

# Function to compute the full variance-covariance matrix of a raw mean
difference:

V_mean_diff <- function(cluster, se_t, se_c,
                        return_list = identical(as.factor(cluster),
sort(as.factor(cluster)))) {

  se_t_list <- split(se_t, cluster)
  se_c_list <- split(se_c, cluster)

  vcov_list <- Map(function(se_A, se_B) se_B^2 + diag(se_A^2, nrow =
length(se_A)), se_A = se_t_list, se_B = se_c_list)

  if (return_list) {
    return(vcov_list)
  }
  else {
    vcov_mat <- metafor::bldiag(vcov_list)
    cluster_index <- order(order(cluster))
    return(vcov_mat[cluster_index, cluster_index])
  }
}

# Here's an example dataset (based on the one you sent me off-list):

example_data <- data.frame(stringsAsFactors=FALSE,
                           sid = c(1L, 1L, 1L, 1L, 1L, 4L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 10L,
                                   10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L,
                                   10L, 10L, 10L, 10L),
                           contrast = c("1A", "1B", "1B", "1C", "1D", "4A",
"7A", "7A", "7B", "7B",
                                        "7C", "7C", "7D", "7D", "10A",
"10B", "10C",
                                        "10E", "10F", "10G", "10H", "10I",
"10J",
                                        "10K", "10L", "10N", "10O", "10P"),
                           sampletype = c("invmusc", "invmusc", "invmusc",
"crwhole", "plant", "vbone",
                                          "fmusc", "fmusc", "fmusc",
"fmusc",
                                          "crwhole", "crwhole", "crwhole",
"crwhole", "plant",
                                          "plant", "plant", "plant",
"plant", "plant",
                                          "plant", "plant", "plant",
"plant", "plant",
                                          "plant", "plant", "plant"),
                           nc = c(3L, 10L, 10L, 10L, 6L, 10L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L,
                                  2L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L,
2L,
                                  2L, 4L, 2L),
                           mc = c(9, 4.345, 4.345, 4.353, 1.58, 16.6,
11.995, 11.995, 10.564,
                                  10.564, 11.461, 11.461, 10.392, 10.392,
6.3,
                                  1.3, 1.3, 4.4, 4.7, 7.1, 5.4, 4.5, 5.9,
-2.3,
                                  3, 1.7, -5, 1.3),
                           sdc = c(0.872066511, 0.256523964, 0.256523964,
0.276130085,
                                   0.75934182, 0.1, 0.296984848,
0.296984848,
                                   0.042751676, 0.042751676, 0.122824448,
                                   0.122824448, 0.241830519, 0.241830519,
0.494974747,
                                   0.919238816, 1.767766953, 0.070710678,
2.303,
                                   0.070710678, 1.728, 0.989949494,
0.353553391,
                                   0.070710678, 0.494974747, 0.070710678,
1.7,
                                   0.212132034),
                           ns = c(3L, 10L, 10L, 10L, 6L, 10L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L,
                                  2L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L,
2L,
                                  2L, 4L, 2L),
                           ms = c(11.9, 4.811, 5.355, 4.457, -0.24, 16.6,
11.764, 11.835,
                                  10.87, 10.714, 11.38, 11.439, 10.246,
10.31,
                                  6.1, 1, 1.1, 4.1, 4.9, 7.5, 4.9, 4.4, 6.1,
                                  -2.9, 1.6, 1.6, -4.4, 1.5),
                           sds = c(0.916787871, 0.762108916, 0.752622083,
0.695701085,
                                   0.881816307, 0.1, 0.165462987,
0.195161472,
                                   0.296984848, 0.357796031, 0.277185858,
                                   0.144249783, 0.371938167, 0.370523953,
1.202081528,
                                   0.919238816, 2.786000718, 0.070710678,
2.254,
                                   0.636396103, 1.47, 1.060660172,
0.212132034,
                                   0.424264069, 1.13137085, 0.070710678,
1.144,
                                   1.060660172)
)

# The variables in the data are as follows:

# sid - study identifier

# contrast - grouping variable identifying effect sizes that share a control

# sampletype - type of sample

# nc/mc/sdc - n/mean/standard deviation for control

# ns/ms/sds - n/mean/standard deviation for acid treated sample


# fix the sort-order of the grouping variable

example_data$contrast <- factor(example_data$contrast,

                                levels = unique(example_data$contrast))


# compute mean difference

example_data$md <- with(example_data, ms - mc)


# compute variance of mean difference, to check against vcov matrix

example_data$V_md <- with(example_data, sdc^2 / nc + sds^2 / ns)


# compute vcov as block-diagonal matrix list

vcov_list <- with(example_data, V_mean_diff(cluster = contrast,

                                            se_c = sdc / sqrt(nc),

                                            se_t = sds / sqrt(ns)))


# compute vcov as full matrix

vcov_mat <- with(example_data, V_mean_diff(cluster = contrast,

                                           se_c = sdc / sqrt(nc),

                                           se_t = sds / sqrt(ns),

                                           return_list = FALSE))


# check that results are consistent with hand calculation

all.equal(example_data$V_md, diag(vcov_mat))



# fit multi-level meta-analysis model

library(metafor)


rma.mv(md ~ 0 + factor(sampletype),

       V = vcov_list,

       random = ~ 1 | sid / contrast,

       data = example_data,

       method = "REML")
On Tue, Sep 11, 2018 at 1:47 PM James Pustejovsky <jepusto at gmail.com> wrote: