-----Original Message-----
From: David Pedrosa [mailto:pedrosac at staff.uni-marburg.de]
Sent: Thursday, 10 March, 2022 0:01
To: Viechtbauer, Wolfgang (SP)
Subject: Re: [R-meta] Background on large meta analysis with RCT and single-arm
studies
Hi Wolfgang,
wow, that's a marvellous answer which helps me quite a lot and gives me something
to brood over! Especially the second part with your thoughts about the meta-
regression with different study types.
One thing that I have been wondering all the time is whether it is valid to
compare different forms of effect sizes. I have some studies that I preferred to
discard since there were only "adjusted mean differences" with their post-test SD
reported and the authors were reluctant to share their other data. According to
my understanding it would not be reasonable to compare SMD with let's say SMCR,
which I personally find more intuitive. Is that correct?
Best and thanks again for the quick reply and this whole package with all that
goes with it.
Best,
David
Am 08.03.2022 um 22:17 schrieb Viechtbauer, Wolfgang (SP):
Hi David,
Let's distinguish three types of designs using the notation of Campbell and
Stanley (1963):
1) Posttest-Only Control Group Design
Trt R X O
Ctrl R O
(R = randomization, X = treatment, O = observation)
For this design, we can compute the usual standardized mean difference of the
form
d = (m_post_trt - m_post_control) / sd_post,
also known as Cohen's d or Hedges' g (when the bias-correction is applied). This
is measure "SMD" in metafor.
2) Pretest-Posttest Control Group Design
Trt O R X O
Ctrl O R O
For this design, we can compute the standardized mean change within each group
and the difference thereof as our effect size measure, so:
d = (m_post_trt - m_pre_trt) / sd_pre_trt - (m_post_ctrl - m_pre_ctrl) /
sd_pre_ctrl.
Importantly, within each group, we standardize based on either the pre- or the
post-test SD, but NOT the SD of the change scores. This can be accomplished in
metafor by using measure "SMCR" (for the 'standardized mean change with raw score
standardization'), once for the treatment and once for the control group and then
taking the difference of the two values (and we sum up their sampling variances).
This is explained in detail here:
https://www.metafor-project.org/doku.php/analyses:morris2008
For randomized studies, the d-values obtained from designs 1 and 2 are directly
comparable. Any pre-treatment differences must be, by definition, random, and
hence could in principle even be ignored. So, we could also treat this as design
1, computing the standardized mean difference only using the post-test
information. This might be an option when the pre-post correlation is not known,
since this correlation is needed to compute the sampling variance of measure
"SMCR".
It is NOT appropriate to use measure "SMCC" (i.e., the 'standardized mean change
with change score standardization') within each group, since the d-value computed
for design 1 uses raw score standardization and so only using "SMCR" will give a
d-value for design 2 that is comparable to that of design 1.
3) One-Group Pretest-Posttest Design
O X O
So here we have observed a single group, once before and once after a treatment.
Campbell and Stanley (1963) discuss in detail the various sources of invalidity
that are not controlled in such a design and hence could lead to incorrect
conclusions one might draw about the 'effect' of treatment X. An obvious one is
that we have no idea whether the change from the pre- to the post-treatment could
also have happened in the absence of X (for other reasons, such as 'maturation').
Leaving this aside for now, for this design, we can compute
d = (m_post - m_pre) / sd_pre,
that is, measure "SMCR". We can think of the pre-treatment observation as the
'control' observation and the post-treatment observation as the 'treatment'
observation. In that sense, this d-value is comparable to that from designs 1 and
2. Again, using raw score standardization is crucial.
As noted above, there are all kinds of issues with design 3 that make it a much
weaker design than 1 and 2 (again, see Campbell and Stanley, 1963). To what
extent these issues affect the d-values in any particular case is difficult to
say. However, given enough d-values from design 3 and the other designs, we can
also approach this issue empirically. That is, we code as a moderator the design
type and then examine in a meta-regression analysis to what extent there are
systematic differences between d-values obtained from the various designs.
One has to be cautious when doing this exercise, since the results from such a
moderator analysis are 'observational' in themselves. So there could be all kinds
of other differences between studies using different designs, unrelated to the
sources of invalidity discussed by Campbell and Stanley, that could lead to
systematic differences in the d-values between different design types. But at
least it is a somewhat more principled approach to addressing the question to
what extent d-values from this design can be combined with those from the other
designs.
I hope this addresses your question. I wrote this up in some detail, since this
is definitely a FAQ and hope to refer people to this post in the future whenever
this question comes up again.
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On
Behalf Of David Pedrosa
Sent: Tuesday, 08 March, 2022 19:20
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Background on large meta analysis with RCT and single-arm
studies
Dear list,
on our group we have performed an extensive search on treatment options
for Parkinson's disease and we have encountered a large number of
different trials and study types. We have managed to get reasonable
comparisons for all RCTs providing mean-differences or before-after
designs and we have finally used the SMD as our metric. What is left is
the relatively large number of pre-post studies with single arm
interventions and the non-randomised controlled trials. While the latter
are comparatevely easy to understand and to model, we are really not
sure about if and how to include single-arm studies. We have tried to
look though the usual book chapters and scientific papers and we have
also looked through the metafor documentation, but we were not very
successful in understanding what the pitfalls would be but especially
how an implementation could look like. If there is anyone who may guide
us a bit or provide some useful links, that would be helpful.
Best wishes,
David