Skip to content
Prev 3919 / 5636 Next

[R-meta] Background on large meta analysis with RCT and single-arm studies

Hi David,

Let's distinguish three types of designs using the notation of Campbell and Stanley (1963):

1) Posttest-Only Control Group Design

Trt   R  X  O
Ctrl  R     O

(R = randomization, X = treatment, O = observation)

For this design, we can compute the usual standardized mean difference of the form 

d = (m_post_trt - m_post_control) / sd_post,

also known as Cohen's d or Hedges' g (when the bias-correction is applied). This is measure "SMD" in metafor.

2) Pretest-Posttest Control Group Design

Trt   O  R  X  O
Ctrl  O  R     O

For this design, we can compute the standardized mean change within each group and the difference thereof as our effect size measure, so:

d = (m_post_trt - m_pre_trt) / sd_pre_trt - (m_post_ctrl - m_pre_ctrl) / sd_pre_ctrl.

Importantly, within each group, we standardize based on either the pre- or the post-test SD, but NOT the SD of the change scores. This can be accomplished in metafor by using measure "SMCR" (for the 'standardized mean change with raw score standardization'), once for the treatment and once for the control group and then taking the difference of the two values (and we sum up their sampling variances). This is explained in detail here:

https://www.metafor-project.org/doku.php/analyses:morris2008

For randomized studies, the d-values obtained from designs 1 and 2 are directly comparable. Any pre-treatment differences must be, by definition, random, and hence could in principle even be ignored. So, we could also treat this as design 1, computing the standardized mean difference only using the post-test information. This might be an option when the pre-post correlation is not known, since this correlation is needed to compute the sampling variance of measure "SMCR".

It is NOT appropriate to use measure "SMCC" (i.e., the 'standardized mean change with change score standardization') within each group, since the d-value computed for design 1 uses raw score standardization and so only using "SMCR" will give a d-value for design 2 that is comparable to that of design 1.

3) One-Group Pretest-Posttest Design

O X O

So here we have observed a single group, once before and once after a treatment. Campbell and Stanley (1963) discuss in detail the various sources of invalidity that are not controlled in such a design and hence could lead to incorrect conclusions one might draw about the 'effect' of treatment X. An obvious one is that we have no idea whether the change from the pre- to the post-treatment could also have happened in the absence of X (for other reasons, such as 'maturation').

Leaving this aside for now, for this design, we can compute

d = (m_post - m_pre) / sd_pre,

that is, measure "SMCR". We can think of the pre-treatment observation as the 'control' observation and the post-treatment observation as the 'treatment' observation. In that sense, this d-value is comparable to that from designs 1 and 2. Again, using raw score standardization is crucial.

As noted above, there are all kinds of issues with design 3 that make it a much weaker design than 1 and 2 (again, see Campbell and Stanley, 1963). To what extent these issues affect the d-values in any particular case is difficult to say. However, given enough d-values from design 3 and the other designs, we can also approach this issue empirically. That is, we code as a moderator the design type and then examine in a meta-regression analysis to what extent there are systematic differences between d-values obtained from the various designs.

One has to be cautious when doing this exercise, since the results from such a moderator analysis are 'observational' in themselves. So there could be all kinds of other differences between studies using different designs, unrelated to the sources of invalidity discussed by Campbell and Stanley, that could lead to systematic differences in the d-values between different design types. But at least it is a somewhat more principled approach to addressing the question to what extent d-values from this design can be combined with those from the other designs.

I hope this addresses your question. I wrote this up in some detail, since this is definitely a FAQ and hope to refer people to this post in the future whenever this question comes up again.

Best,
Wolfgang