[R-meta] multi-level meta-analysis with dependent effect sizes

3 messages · Andrew Guerin, James Pustejovsky

Tue, Sep 11, 2018 8:54 AM #

Hi James,

Thanks for the rapid response.

The study is looking at how a preparation technique (acid treatment) affects particular measurements (stable isotope signatures) in sample materials. Experiments compare the isotope signatures of replicates with and without acid treatment - so it is the differences, rather than the absolute values, which are of interest (so of your two suggestions, it would have to be option 1).

The hierarchical dependence arises as you have guessed. All the included studies are explicit tests of acid treatment. While some studies look at the effects on just one type of sample material (say, crab muscle or fish scales) others might test a selection of materials from various sources. So there can be multiple values of 'sampletype' within a study. The other moderators (A-H) concern potentially influential differences in the experimental methods used (eg. type of acid, method of application, how samples were treated before acid exposure, etc). 'Sampletype' is the main moderator of interest - the idea is to be able to say whether or not acid treatment is desirable/necessary for a particular type of material (an overall estimate of effect size for all types of sample combined is not very interesting). The other moderators are less fundamentally interesting, they are there mainly to account for differences in the methods used in the studies included in the analysis, though of course it is interesting to be able to report whether they make any difference.

In terms of variation in sampletype and the other moderators, you could put the studies into 4 groups:

1) Single sample type, fixed protocol (all moderators A-H constant; 10 studies)

2) Single sample type, multiple protocols (generally one or two moderators will vary, others will remain constant; 2 studies)

3) Multiple sample types, fixed protocol (the most common - 25 studies)

4) Multiple sample types, multiple protocols (such studies are a lot of work, so they are generally limited to 4 or fewer sample types, and a small number of variations in one or two moderators; 7 studies)

So there is a bit of both between-study and within-study variation in the moderators. For the two most common types of study (1 and 3) since there is only one protocol there will be multiple independent treatment/control pairs, so no 'multiple treatments' type dependence (though still potential hierarchical dependence). For type 2 there will be one group of treatments per study, but all compared to a single control. Studies in group 4 will presumably exhibit both types of dependence.

Cheers,

Andrew

From: James Pustejovsky <jepusto at gmail.com>
Sent: 11 September 2018 15:23
To: Andrew Guerin
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] multi-level meta-analysis with dependent effect sizes

Andrew,

In your study, how does the hierarchical dependence arise? Is it that some studies report results from multiple (presumably independent) samples?

If so, then I would argue that the best thing to do might be one of the following:

1. Go full Gleser-Olkin. If multiple-treatments dependence is the *only* type of sampling dependence in your data, then it would be better just to work through the calculation of the covariances. This would let you then treat the sampling variance-covariance matrix as known, and you could avoid the need for cluster-robust standard errors all together (unless there are other types of mis-specification to worry about, in which case clustered SEs could still be useful).

2. Since you have a common outcome scale, you could also consider using the mean outcome as the ES. If different treatment groups within a study are independent (i.e., a between-subjects design), then modeling the means directly would let you avoid dealing with the Gleser-Olkin business.

Generally, which of these (or the other approaches you described) is most defensible will depend on the goals of your analysis and, in particular, whether the moderators of interest involve between-study variation, within-study variation, or both. For moderators that vary between-study, I have found (based on personal experience) that the approach to modeling/handling dependence tends not to matter much. But for moderators where there is within-study variation, it is more salient.

Does sampletype vary within study? And what about the other factors?

James

On Tue, Sep 11, 2018 at 8:30 AM Andrew Guerin <Andrew.Guerin at newcastle.ac.uk<mailto:Andrew.Guerin at newcastle.ac.uk>> wrote:
I am new to meta-analysis, and have benefited greatly from the excellent documentation and resources for meta-analysis in R, especially the websites for metafor and clubSandwich. I am planning some extensive future work using meta-analysis, so thought it would be useful to get some experience of the whole process by starting with something relatively simple (haha) and trying to work it up for a publication. I would like some advice on whether I am following the correct approach.


I selected a topic with which I am quite familiar and which has a relatively limited literature: I did some thorough literature searching and came up with ~70 relevant studies, which I will use for a few models looking at different aspects (I'll focus on one here). Once I had coded all the data it became clear that the analysis was not going to be as simple as I had hoped, since the data have two kinds of dependence that should be accounted for:


1. Hierarchical dependence - a lot of the studies contribute multiple effect sizes.

2. Many of the effect sizes share controls - different treatments are compared against a common 'untreated' control - Gleser and Olkin's (2009) 'multiple treatments' dependence.


For this analysis, there are a total of 302 effect sizes, grouped into 244 'contrast' groups which share a control, from 43 studies. The 244 contrast groups contain 1-8 effect sizes.


There are various moderators. The most important is 'sampletype' - expected to be significant, as it is expected that treatment will have different effects on different sample types. The other factors all relate to variations in the experimental treatments used, to keep things general I'll just call them "A" to "H".


ndat = data frame

md = effect size measure (raw mean difference, since all data are on a common scale, with a limited range)

mdv = sampling variation for mean difference, obtained using escalc().

sid = study id (a simple numerical identifier)

contrast = identifier showing which effect size measures / treatments share a control (eg. study one contributes 5 effect sizes, with values of 'contrast' 1A, 1B, 1B, 1C, 1D, indicating that of the 5 effect sizes in study 1, 2 share a control).


As I see it I have the following options for how to proceed:


A) conduct a completely 'cluster naive' mixed effects meta-analysis


nmtA <- rma(md, mdv,  mods = ~ factor(sampletype) + factor(A) ... + factor(H), data=ndat, method="REML")

anova(nmtA, btt=20:21) ## example hypothesis test for moderator 'D'


B) ignore the hierarchical dependence (ie. assume that any samples with different values of 'contrast' within the same study are independent), but account for the correlated errors within clusters / contrast groups by RVE in robumeta, using clubSandwich for the hypothesis tests.


nmtB <- robu(md ~ factor(sampletype) + factor(A) ... + factor(H), data=ndat, var.eff.size=mdv, rho = 0.8, modelweights="CORR", studynum=contrast)

Wald_test(nmtB, constraints = 20:21, vcov="CR2") ## example hypothesis test for moderator 'D'



C) still ignoring the hierarchical dependence, but now explicitly specifying the random effects in rma.mv<http://rma.mv>, after imputing a variance-covariance matrix. Robust parameter estimates and hypothesis test are then carried out using clubSandwich.


vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast, r=0.7)

nmtC <- rma.mv<http://rma.mv>(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A) ... + factor(H), random = ~1|contrast, data=ndat, method="REML")

nmtC_robust <- coef_test(nmtC, vcov="CR2")

Wald_test(nmtC, constraints = 20:21, vcov="CR2") ## example hypothesis test for moderator 'D'


D) attempting to account for the hierarchical and multiple-treatments dependence via a multi-level model, setting random effect structure as random = ~ 1| sid/contrast, and then using clubSandwich for RVE and hypothesis tests.


vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast, r=0.7)

nmtD <- rma.mv<http://rma.mv>(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A) ... + factor(H), random = ~1|sid/contrast, data=ndat, method="REML")

nmtD_robust <- coef_test(nmtD, vcov="CR2", cluster=ndat$contrast)

Wald_test(nmtD, constraints = 20:21, vcov="CR2", cluster=ndat$contrast) ## example hypothesis test for moderator 'D'


My main question really is which (if any) of these approaches is the best way forward? For this particular dataset, the outcomes (at least in terms of which moderators are significant/non-significant, and most parameter estimates) are pretty similar for all the options (including some other options that I tried that I have not mentioned here). All find 'sampletype' highly significant, and find similar sets of the other moderators to be non-significant (except for the naive analysis, A which finds more moderators to be significant). B, C, and D are all very similar. Although D is the only one that explicitly accounts for the hierarchical dependence, does the fact that the outcomes from B and C are substantially similar mean that this could be safely overlooked for this analysis?


I have not yet attempted to build my own vcov matrix using the Gleser and Olkin (2009) formula. Is this worth doing instead of using impute_covariance_matrix? I picked 0.7 as a plausible value for r in impute_convariance_matrix(), but I have raw data from one study that might allow me to derive a better estimate. I did investigate the effect of varying r from 0-1. It does not seem to have much impact on parameter estimates, the significance/non-significance of moderators, or the estimate of sigma^2.2 in the multi-level model. The intercept moves about a little, but the main effect is that when r is less than about 0.5, sigma^2.1 is basically 0. Is this indicating a problem?


Finally I was wondering whether it is wise to account for multiple comparisons when evaluating the hypothesis tests. Since I have several factors, I repeat Wald_test() several times. Models nmtB, nmtC, and nmtD all have moderators  (aside from 'sampletype') that are significant at p < 0.05, but with p values only just under 0.05. Adjustment for multiple comparisons (eg. using p.adjust(method="fdr")) shifts the outcomes of all these tests to p > 0.05.


I would be grateful for any advice on the above.


Many thanks,

Andrew





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org<mailto:R-sig-meta-analysis at r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

James Pustejovsky

Tue, Sep 11, 2018 11:47 AM #

Andrew,

Your description of the data structure is very clear. (Bravo for that! Many
meta-analyses that I have reviewed---even quite recent ones---don't do a
great job of describing the hierarchical/multivariate structure of the
data, even though it's quite important.) Based on your responses, I think
the best thing to do would be to use the Gleser & Olkin approach to try and
approximate the covariances among effect size estimates that share a common
control condition. The reason I think it's best is that you should have all
the information you need in order to calculate the covariances (when the
effect sizes are differences in means, all that is needed are means, SDs,
and Ns per cell), and so there's not a strong rationale for using other
approximations. For example, using impute_covariance_matrix() with r = 0.7
is likely to be a bit too high. Consider an example where two treatment
cells A and B, and the control cell C all have equal sample sizes n, and
the SDs are also all equal to S. Then

Var(A - C) =  S^2 * 2 / n
Var(B - C) = S^2 * 2 / n
Cov(A - C, B - C) = S^2 / n
And so Cor(A - C, B - C) = 0.5, rather than 0.7.

A further reason for going this route is that using good estimates of
sampling covariances is the only way I know of to get defensible,
interpretable estimates of the random effects variance components of the
model. If you use something along the lines of your model D,

nmtD <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A)
... + factor(H), random = ~1|sid/contrast, data=ndat, method="REML")

(but with vcvndat based on Gleser & Olkin), then you would be able to
interpret the random effects vairances on sid and contrast. You noted that
one of the variance components (not sure which one is sigma2.1 versus
sigma2.2) does seem to be sensitive to the assumed r, so Gleser & Olkin
would be a way to pin that down. It would also let you explore other
potential specifications of the random effects structure, such as allowing
variance components to differ across levels of sampletype (or other
moderators).

All that said, computing the full Gleser & Olkin vcov matrix is a bit
inconvenient because you have to work with block diagonal matrices. If you
send a smallish example dataset, I can try to provide some example code for
how to do these calculations.

James

On Tue, Sep 11, 2018 at 10:54 AM Andrew Guerin <

Andrew.Guerin at newcastle.ac.uk> wrote:

Hi James,


Thanks for the rapid response.


The study is looking at how a preparation technique (acid treatment)
affects particular measurements (stable isotope signatures) in sample
materials. Experiments compare the isotope signatures of replicates with
and without acid treatment - so it is the differences, rather than the
absolute values, which are of interest (so of your two suggestions, it
would have to be option 1).


The hierarchical dependence arises as you have guessed. All the included
studies are explicit tests of acid treatment. While some studies look at
the effects on just one type of sample material (say, crab muscle or fish
scales)  others might test a selection of materials from various sources.
So there can be multiple values of 'sampletype' within a study. The other
moderators (A-H) concern potentially influential differences in the
experimental methods used (eg. type of acid, method of application, how
samples were treated before acid exposure, etc). 'Sampletype' is the main
moderator of interest - the idea is to be able to say whether or not acid
treatment is desirable/necessary for a particular type of material (an
overall estimate of effect size for all types of sample combined is not
very interesting). The other moderators are less fundamentally interesting,
they are there mainly to account for differences in the methods used in the
studies included in the analysis, though of course it is interesting to be
able to report whether they make any difference.


In terms of variation in sampletype and the other moderators, you could
put the studies into 4 groups:


1) Single sample type, fixed protocol (all moderators A-H constant; 10
studies)

2) Single sample type, multiple protocols (generally one or two moderators
will vary, others will remain constant; 2 studies)

3) Multiple sample types, fixed protocol (the most common - 25 studies)

4) Multiple sample types, multiple protocols (such studies are a lot of
work, so they are generally limited to 4 or fewer sample types, and a small
number of variations in one or two moderators; 7 studies)


So there is a bit of both between-study and within-study variation in the
moderators. For the two most common types of study (1 and 3) since there is
only one protocol there will be multiple independent
treatment/control pairs, so no 'multiple treatments' type dependence
(though still potential hierarchical dependence). For type 2 there will
be one group of treatments per study, but all compared to a single control.
Studies in group 4 will presumably exhibit both types of dependence.


Cheers,

Andrew





------------------------------
*From:* James Pustejovsky <jepusto at gmail.com>
*Sent:* 11 September 2018 15:23
*To:* Andrew Guerin
*Cc:* r-sig-meta-analysis at r-project.org
*Subject:* Re: [R-meta] multi-level meta-analysis with dependent effect
sizes

Andrew,

In your study, how does the hierarchical dependence arise? Is it that some
studies report results from multiple (presumably independent) samples?

If so, then I would argue that the best thing to do might be one of the
following:

1. Go full Gleser-Olkin. If multiple-treatments dependence is the *only*
type of sampling dependence in your data, then it would be better just to
work through the calculation of the covariances. This would let you then
treat the sampling variance-covariance matrix as known, and you could avoid
the need for cluster-robust standard errors all together (unless there are
other types of mis-specification to worry about, in which case clustered
SEs could still be useful).

2. Since you have a common outcome scale, you could also consider using
the mean outcome as the ES. If different treatment groups within a study
are independent (i.e., a between-subjects design), then modeling the means
directly would let you avoid dealing with the Gleser-Olkin business.

Generally, which of these (or the other approaches you described) is most
defensible will depend on the goals of your analysis and, in particular,
whether the moderators of interest involve between-study variation,
within-study variation, or both. For moderators that vary between-study, I
have found (based on personal experience) that the approach to
modeling/handling dependence tends not to matter much. But for moderators
where there is within-study variation, it is more salient.

Does sampletype vary within study? And what about the other factors?

James

On Tue, Sep 11, 2018 at 8:30 AM Andrew Guerin <
Andrew.Guerin at newcastle.ac.uk> wrote:

I am new to meta-analysis, and have benefited greatly from the excellent
documentation and resources for meta-analysis in R, especially the websites
for metafor and clubSandwich. I am planning some extensive future work
using meta-analysis, so thought it would be useful to get some experience
of the whole process by starting with something relatively simple (haha)
and trying to work it up for a publication. I would like some advice on
whether I am following the correct approach.


I selected a topic with which I am quite familiar and which has a
relatively limited literature: I did some thorough literature searching and
came up with ~70 relevant studies, which I will use for a few models
looking at different aspects (I'll focus on one here). Once I had coded all
the data it became clear that the analysis was not going to be as simple as
I had hoped, since the data have two kinds of dependence that should be
accounted for:


1. Hierarchical dependence - a lot of the studies contribute multiple
effect sizes.

2. Many of the effect sizes share controls - different treatments are
compared against a common 'untreated' control - Gleser and Olkin's (2009)
'multiple treatments' dependence.


For this analysis, there are a total of 302 effect sizes, grouped into 244
'contrast' groups which share a control, from 43 studies. The 244 contrast
groups contain 1-8 effect sizes.


There are various moderators. The most important is 'sampletype' -
expected to be significant, as it is expected that treatment will have
different effects on different sample types. The other factors all relate
to variations in the experimental treatments used, to keep things general
I'll just call them "A" to "H".


ndat = data frame

md = effect size measure (raw mean difference, since all data are on a
common scale, with a limited range)

mdv = sampling variation for mean difference, obtained using escalc().

sid = study id (a simple numerical identifier)

contrast = identifier showing which effect size measures / treatments
share a control (eg. study one contributes 5 effect sizes, with values of
'contrast' 1A, 1B, 1B, 1C, 1D, indicating that of the 5 effect sizes in
study 1, 2 share a control).


As I see it I have the following options for how to proceed:


A) conduct a completely 'cluster naive' mixed effects meta-analysis


nmtA <- rma(md, mdv,  mods = ~ factor(sampletype) + factor(A) ... +
factor(H), data=ndat, method="REML")

anova(nmtA, btt=20:21) ## example hypothesis test for moderator 'D'


B) ignore the hierarchical dependence (ie. assume that any samples with
different values of 'contrast' within the same study are independent), but
account for the correlated errors within clusters / contrast groups by RVE
in robumeta, using clubSandwich for the hypothesis tests.


nmtB <- robu(md ~ factor(sampletype) + factor(A) ... + factor(H),
data=ndat, var.eff.size=mdv, rho = 0.8, modelweights="CORR",
studynum=contrast)

Wald_test(nmtB, constraints = 20:21, vcov="CR2") ## example hypothesis
test for moderator 'D'



C) still ignoring the hierarchical dependence, but now explicitly
specifying the random effects in rma.mv, after imputing a
variance-covariance matrix. Robust parameter estimates and hypothesis test
are then carried out using clubSandwich.


vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast,
r=0.7)

nmtC <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A)
... + factor(H), random = ~1|contrast, data=ndat, method="REML")

nmtC_robust <- coef_test(nmtC, vcov="CR2")

Wald_test(nmtC, constraints = 20:21, vcov="CR2") ## example hypothesis
test for moderator 'D'


D) attempting to account for the hierarchical and multiple-treatments
dependence via a multi-level model, setting random effect structure as
random = ~ 1| sid/contrast, and then using clubSandwich for RVE and
hypothesis tests.


vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast,
r=0.7)

nmtD <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A)
... + factor(H), random = ~1|sid/contrast, data=ndat, method="REML")

nmtD_robust <- coef_test(nmtD, vcov="CR2", cluster=ndat$contrast)

Wald_test(nmtD, constraints = 20:21, vcov="CR2", cluster=ndat$contrast) ##
example hypothesis test for moderator 'D'


My main question really is which (if any) of these approaches is the best
way forward? For this particular dataset, the outcomes (at least in terms
of which moderators are significant/non-significant, and most parameter
estimates) are pretty similar for all the options (including some other
options that I tried that I have not mentioned here). All find 'sampletype'
highly significant, and find similar sets of the other moderators to be
non-significant (except for the naive analysis, A which finds more
moderators to be significant). B, C, and D are all very similar. Although D
is the only one that explicitly accounts for the hierarchical dependence,
does the fact that the outcomes from B and C are substantially similar mean
that this could be safely overlooked for this analysis?


I have not yet attempted to build my own vcov matrix using the Gleser and
Olkin (2009) formula. Is this worth doing instead of using
impute_covariance_matrix? I picked 0.7 as a plausible value for r in
impute_convariance_matrix(), but I have raw data from one study that might
allow me to derive a better estimate. I did investigate the effect of
varying r from 0-1. It does not seem to have much impact on parameter
estimates, the significance/non-significance of moderators, or the estimate
of sigma^2.2 in the multi-level model. The intercept moves about a little,
but the main effect is that when r is less than about 0.5, sigma^2.1 is
basically 0. Is this indicating a problem?


Finally I was wondering whether it is wise to account for multiple
comparisons when evaluating the hypothesis tests. Since I have several
factors, I repeat Wald_test() several times. Models nmtB, nmtC, and nmtD
all have moderators  (aside from 'sampletype') that are significant at p <
0.05, but with p values only just under 0.05. Adjustment for multiple
comparisons (eg. using p.adjust(method="fdr")) shifts the outcomes of all
these tests to p > 0.05.


I would be grateful for any advice on the above.


Many thanks,

Andrew





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

9 days later

James Pustejovsky

Thu, Sep 20, 2018 12:12 PM #

Andrew,

You mentioned that you've figured out how to do this, but I figured that I
would share my approach anyways in case others find it useful.

Cheers,
James

# Function to compute the full variance-covariance matrix of a raw mean
difference:

V_mean_diff <- function(cluster, se_t, se_c,
                        return_list = identical(as.factor(cluster),
sort(as.factor(cluster)))) {

  se_t_list <- split(se_t, cluster)
  se_c_list <- split(se_c, cluster)

  vcov_list <- Map(function(se_A, se_B) se_B^2 + diag(se_A^2, nrow =
length(se_A)), se_A = se_t_list, se_B = se_c_list)

  if (return_list) {
    return(vcov_list)
  }
  else {
    vcov_mat <- metafor::bldiag(vcov_list)
    cluster_index <- order(order(cluster))
    return(vcov_mat[cluster_index, cluster_index])
  }
}

# Here's an example dataset (based on the one you sent me off-list):

example_data <- data.frame(stringsAsFactors=FALSE,
                           sid = c(1L, 1L, 1L, 1L, 1L, 4L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 10L,
                                   10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L,
                                   10L, 10L, 10L, 10L),
                           contrast = c("1A", "1B", "1B", "1C", "1D", "4A",
"7A", "7A", "7B", "7B",
                                        "7C", "7C", "7D", "7D", "10A",
"10B", "10C",
                                        "10E", "10F", "10G", "10H", "10I",
"10J",
                                        "10K", "10L", "10N", "10O", "10P"),
                           sampletype = c("invmusc", "invmusc", "invmusc",
"crwhole", "plant", "vbone",
                                          "fmusc", "fmusc", "fmusc",
"fmusc",
                                          "crwhole", "crwhole", "crwhole",
"crwhole", "plant",
                                          "plant", "plant", "plant",
"plant", "plant",
                                          "plant", "plant", "plant",
"plant", "plant",
                                          "plant", "plant", "plant"),
                           nc = c(3L, 10L, 10L, 10L, 6L, 10L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L,
                                  2L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L,
2L,
                                  2L, 4L, 2L),
                           mc = c(9, 4.345, 4.345, 4.353, 1.58, 16.6,
11.995, 11.995, 10.564,
                                  10.564, 11.461, 11.461, 10.392, 10.392,
6.3,
                                  1.3, 1.3, 4.4, 4.7, 7.1, 5.4, 4.5, 5.9,
-2.3,
                                  3, 1.7, -5, 1.3),
                           sdc = c(0.872066511, 0.256523964, 0.256523964,
0.276130085,
                                   0.75934182, 0.1, 0.296984848,
0.296984848,
                                   0.042751676, 0.042751676, 0.122824448,
                                   0.122824448, 0.241830519, 0.241830519,
0.494974747,
                                   0.919238816, 1.767766953, 0.070710678,
2.303,
                                   0.070710678, 1.728, 0.989949494,
0.353553391,
                                   0.070710678, 0.494974747, 0.070710678,
1.7,
                                   0.212132034),
                           ns = c(3L, 10L, 10L, 10L, 6L, 10L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L,
                                  2L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L,
2L,
                                  2L, 4L, 2L),
                           ms = c(11.9, 4.811, 5.355, 4.457, -0.24, 16.6,
11.764, 11.835,
                                  10.87, 10.714, 11.38, 11.439, 10.246,
10.31,
                                  6.1, 1, 1.1, 4.1, 4.9, 7.5, 4.9, 4.4, 6.1,
                                  -2.9, 1.6, 1.6, -4.4, 1.5),
                           sds = c(0.916787871, 0.762108916, 0.752622083,
0.695701085,
                                   0.881816307, 0.1, 0.165462987,
0.195161472,
                                   0.296984848, 0.357796031, 0.277185858,
                                   0.144249783, 0.371938167, 0.370523953,
1.202081528,
                                   0.919238816, 2.786000718, 0.070710678,
2.254,
                                   0.636396103, 1.47, 1.060660172,
0.212132034,
                                   0.424264069, 1.13137085, 0.070710678,
1.144,
                                   1.060660172)
)

# The variables in the data are as follows:

# sid - study identifier

# contrast - grouping variable identifying effect sizes that share a control

# sampletype - type of sample

# nc/mc/sdc - n/mean/standard deviation for control

# ns/ms/sds - n/mean/standard deviation for acid treated sample


# fix the sort-order of the grouping variable

example_data$contrast <- factor(example_data$contrast,

                                levels = unique(example_data$contrast))


# compute mean difference

example_data$md <- with(example_data, ms - mc)


# compute variance of mean difference, to check against vcov matrix

example_data$V_md <- with(example_data, sdc^2 / nc + sds^2 / ns)


# compute vcov as block-diagonal matrix list

vcov_list <- with(example_data, V_mean_diff(cluster = contrast,

                                            se_c = sdc / sqrt(nc),

                                            se_t = sds / sqrt(ns)))


# compute vcov as full matrix

vcov_mat <- with(example_data, V_mean_diff(cluster = contrast,

                                           se_c = sdc / sqrt(nc),

                                           se_t = sds / sqrt(ns),

                                           return_list = FALSE))


# check that results are consistent with hand calculation

all.equal(example_data$V_md, diag(vcov_mat))



# fit multi-level meta-analysis model

library(metafor)


rma.mv(md ~ 0 + factor(sampletype),

       V = vcov_list,

       random = ~ 1 | sid / contrast,

       data = example_data,

       method = "REML")

On Tue, Sep 11, 2018 at 1:47 PM James Pustejovsky <jepusto at gmail.com> wrote:

Andrew,

Your description of the data structure is very clear. (Bravo for that!
Many meta-analyses that I have reviewed---even quite recent ones---don't do
a great job of describing the hierarchical/multivariate structure of the
data, even though it's quite important.) Based on your responses, I think
the best thing to do would be to use the Gleser & Olkin approach to try and
approximate the covariances among effect size estimates that share a common
control condition. The reason I think it's best is that you should have all
the information you need in order to calculate the covariances (when the
effect sizes are differences in means, all that is needed are means, SDs,
and Ns per cell), and so there's not a strong rationale for using other
approximations. For example, using impute_covariance_matrix() with r = 0.7
is likely to be a bit too high. Consider an example where two treatment
cells A and B, and the control cell C all have equal sample sizes n, and
the SDs are also all equal to S. Then

Var(A - C) =  S^2 * 2 / n
Var(B - C) = S^2 * 2 / n
Cov(A - C, B - C) = S^2 / n
And so Cor(A - C, B - C) = 0.5, rather than 0.7.

A further reason for going this route is that using good estimates of
sampling covariances is the only way I know of to get defensible,
interpretable estimates of the random effects variance components of the
model. If you use something along the lines of your model D,

nmtD <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) + factor(A)
... + factor(H), random = ~1|sid/contrast, data=ndat, method="REML")

(but with vcvndat based on Gleser & Olkin), then you would be able to
interpret the random effects vairances on sid and contrast. You noted that
one of the variance components (not sure which one is sigma2.1 versus
sigma2.2) does seem to be sensitive to the assumed r, so Gleser & Olkin
would be a way to pin that down. It would also let you explore other
potential specifications of the random effects structure, such as allowing
variance components to differ across levels of sampletype (or other
moderators).

All that said, computing the full Gleser & Olkin vcov matrix is a bit
inconvenient because you have to work with block diagonal matrices. If you
send a smallish example dataset, I can try to provide some example code for
how to do these calculations.

James

On Tue, Sep 11, 2018 at 10:54 AM Andrew Guerin <
Andrew.Guerin at newcastle.ac.uk> wrote:

Hi James,


Thanks for the rapid response.


The study is looking at how a preparation technique (acid treatment)
affects particular measurements (stable isotope signatures) in sample
materials. Experiments compare the isotope signatures of replicates with
and without acid treatment - so it is the differences, rather than the
absolute values, which are of interest (so of your two suggestions, it
would have to be option 1).


The hierarchical dependence arises as you have guessed. All the included
studies are explicit tests of acid treatment. While some studies look at
the effects on just one type of sample material (say, crab muscle or
fish scales)  others might test a selection of materials from various
sources. So there can be multiple values of 'sampletype' within a study.
The other moderators (A-H) concern potentially influential differences in
the experimental methods used (eg. type of acid, method of application, how
samples were treated before acid exposure, etc). 'Sampletype' is the main
moderator of interest - the idea is to be able to say whether or not acid
treatment is desirable/necessary for a particular type of material (an
overall estimate of effect size for all types of sample combined is not
very interesting). The other moderators are less fundamentally interesting,
they are there mainly to account for differences in the methods used in the
studies included in the analysis, though of course it is interesting to be
able to report whether they make any difference.


In terms of variation in sampletype and the other moderators, you could
put the studies into 4 groups:


1) Single sample type, fixed protocol (all moderators A-H constant; 10
studies)

2) Single sample type, multiple protocols (generally one or two
moderators will vary, others will remain constant; 2 studies)

3) Multiple sample types, fixed protocol (the most common - 25 studies)

4) Multiple sample types, multiple protocols (such studies are a lot of
work, so they are generally limited to 4 or fewer sample types, and a small
number of variations in one or two moderators; 7 studies)


So there is a bit of both between-study and within-study variation in the
moderators. For the two most common types of study (1 and 3) since there is
only one protocol there will be multiple independent
treatment/control pairs, so no 'multiple treatments' type dependence
(though still potential hierarchical dependence). For type 2 there will
be one group of treatments per study, but all compared to a single control.
Studies in group 4 will presumably exhibit both types of dependence.


Cheers,

Andrew





------------------------------
*From:* James Pustejovsky <jepusto at gmail.com>
*Sent:* 11 September 2018 15:23
*To:* Andrew Guerin
*Cc:* r-sig-meta-analysis at r-project.org
*Subject:* Re: [R-meta] multi-level meta-analysis with dependent effect
sizes

Andrew,

In your study, how does the hierarchical dependence arise? Is it that
some studies report results from multiple (presumably independent) samples?

If so, then I would argue that the best thing to do might be one of the
following:

1. Go full Gleser-Olkin. If multiple-treatments dependence is the *only*
type of sampling dependence in your data, then it would be better just to
work through the calculation of the covariances. This would let you then
treat the sampling variance-covariance matrix as known, and you could avoid
the need for cluster-robust standard errors all together (unless there are
other types of mis-specification to worry about, in which case clustered
SEs could still be useful).

2. Since you have a common outcome scale, you could also consider using
the mean outcome as the ES. If different treatment groups within a study
are independent (i.e., a between-subjects design), then modeling the means
directly would let you avoid dealing with the Gleser-Olkin business.

Generally, which of these (or the other approaches you described) is most
defensible will depend on the goals of your analysis and, in particular,
whether the moderators of interest involve between-study variation,
within-study variation, or both. For moderators that vary between-study, I
have found (based on personal experience) that the approach to
modeling/handling dependence tends not to matter much. But for moderators
where there is within-study variation, it is more salient.

Does sampletype vary within study? And what about the other factors?

James

On Tue, Sep 11, 2018 at 8:30 AM Andrew Guerin <
Andrew.Guerin at newcastle.ac.uk> wrote:

I am new to meta-analysis, and have benefited greatly from the excellent
documentation and resources for meta-analysis in R, especially the websites
for metafor and clubSandwich. I am planning some extensive future work
using meta-analysis, so thought it would be useful to get some experience
of the whole process by starting with something relatively simple (haha)
and trying to work it up for a publication. I would like some advice on
whether I am following the correct approach.


I selected a topic with which I am quite familiar and which has a
relatively limited literature: I did some thorough literature searching and
came up with ~70 relevant studies, which I will use for a few models
looking at different aspects (I'll focus on one here). Once I had coded all
the data it became clear that the analysis was not going to be as simple as
I had hoped, since the data have two kinds of dependence that should be
accounted for:


1. Hierarchical dependence - a lot of the studies contribute multiple
effect sizes.

2. Many of the effect sizes share controls - different treatments are
compared against a common 'untreated' control - Gleser and Olkin's (2009)
'multiple treatments' dependence.


For this analysis, there are a total of 302 effect sizes, grouped into
244 'contrast' groups which share a control, from 43 studies. The 244
contrast groups contain 1-8 effect sizes.


There are various moderators. The most important is 'sampletype' -
expected to be significant, as it is expected that treatment will have
different effects on different sample types. The other factors all relate
to variations in the experimental treatments used, to keep things general
I'll just call them "A" to "H".


ndat = data frame

md = effect size measure (raw mean difference, since all data are on a
common scale, with a limited range)

mdv = sampling variation for mean difference, obtained using escalc().

sid = study id (a simple numerical identifier)

contrast = identifier showing which effect size measures / treatments
share a control (eg. study one contributes 5 effect sizes, with values of
'contrast' 1A, 1B, 1B, 1C, 1D, indicating that of the 5 effect sizes in
study 1, 2 share a control).


As I see it I have the following options for how to proceed:


A) conduct a completely 'cluster naive' mixed effects meta-analysis


nmtA <- rma(md, mdv,  mods = ~ factor(sampletype) + factor(A) ... +
factor(H), data=ndat, method="REML")

anova(nmtA, btt=20:21) ## example hypothesis test for moderator 'D'


B) ignore the hierarchical dependence (ie. assume that any samples with
different values of 'contrast' within the same study are independent), but
account for the correlated errors within clusters / contrast groups by RVE
in robumeta, using clubSandwich for the hypothesis tests.


nmtB <- robu(md ~ factor(sampletype) + factor(A) ... + factor(H),
data=ndat, var.eff.size=mdv, rho = 0.8, modelweights="CORR",
studynum=contrast)

Wald_test(nmtB, constraints = 20:21, vcov="CR2") ## example hypothesis
test for moderator 'D'



C) still ignoring the hierarchical dependence, but now explicitly
specifying the random effects in rma.mv, after imputing a
variance-covariance matrix. Robust parameter estimates and hypothesis test
are then carried out using clubSandwich.


vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast,
r=0.7)

nmtC <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) +
factor(A) ... + factor(H), random = ~1|contrast, data=ndat, method="REML")

nmtC_robust <- coef_test(nmtC, vcov="CR2")

Wald_test(nmtC, constraints = 20:21, vcov="CR2") ## example hypothesis
test for moderator 'D'


D) attempting to account for the hierarchical and multiple-treatments
dependence via a multi-level model, setting random effect structure as
random = ~ 1| sid/contrast, and then using clubSandwich for RVE and
hypothesis tests.


vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast,
r=0.7)

nmtD <- rma.mv(yi=md, V=vcvndat,  mods = ~ factor(sampletype) +
factor(A) ... + factor(H), random = ~1|sid/contrast, data=ndat,
method="REML")

nmtD_robust <- coef_test(nmtD, vcov="CR2", cluster=ndat$contrast)

Wald_test(nmtD, constraints = 20:21, vcov="CR2", cluster=ndat$contrast)
## example hypothesis test for moderator 'D'


My main question really is which (if any) of these approaches is the best
way forward? For this particular dataset, the outcomes (at least in terms
of which moderators are significant/non-significant, and most parameter
estimates) are pretty similar for all the options (including some other
options that I tried that I have not mentioned here). All find 'sampletype'
highly significant, and find similar sets of the other moderators to be
non-significant (except for the naive analysis, A which finds more
moderators to be significant). B, C, and D are all very similar. Although D
is the only one that explicitly accounts for the hierarchical dependence,
does the fact that the outcomes from B and C are substantially similar mean
that this could be safely overlooked for this analysis?


I have not yet attempted to build my own vcov matrix using the Gleser and
Olkin (2009) formula. Is this worth doing instead of using
impute_covariance_matrix? I picked 0.7 as a plausible value for r in
impute_convariance_matrix(), but I have raw data from one study that might
allow me to derive a better estimate. I did investigate the effect of
varying r from 0-1. It does not seem to have much impact on parameter
estimates, the significance/non-significance of moderators, or the estimate
of sigma^2.2 in the multi-level model. The intercept moves about a little,
but the main effect is that when r is less than about 0.5, sigma^2.1 is
basically 0. Is this indicating a problem?


Finally I was wondering whether it is wise to account for multiple
comparisons when evaluating the hypothesis tests. Since I have several
factors, I repeat Wald_test() several times. Models nmtB, nmtC, and nmtD
all have moderators  (aside from 'sampletype') that are significant at p <
0.05, but with p values only just under 0.05. Adjustment for multiple
comparisons (eg. using p.adjust(method="fdr")) shifts the outcomes of all
these tests to p > 0.05.


I would be grateful for any advice on the above.


Many thanks,

Andrew





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis