[R-meta] Best choice of effect size

Hi Luke,
Responses inline below.
James

On Sun, Oct 3, 2021 at 3:16 PM Luke Martinez <martinezlukerm at gmail.com>
wrote:
Dear James,

Thank you for the thorough and thought-provoking response. Here are my
two takeaways:

 1- Your insightful advice seems to be a general criticism of SMDs in
general due to the use of some form of SD in the denominator and not
just when dealing with my situation (i.e., studies reporting M, and SD
of proportions and/or counts), right?

I would not go quite that far. The concerns I raised with the SMD are more
salient when dealing with outcomes that are proportions or counts.
2- When using SMDs, one has to keep an open eye regarding reliability
estimates, and factors affecting them (e.g., time provided for the
test) in the studies and possibly control for them in the analysis,
right?

Yes. Although, I would add that using an effect measure that is invariant
(or at least relatively robust) to such factors is preferable to trying to
account for the factors using meta-regression.
I also wanted to clarify two things:

First, by log-transformed response ratio, you mean "ROM" or "ROMC" as
represented in metafor::escalc?

Yes.
Second, by reference group, you simply mean the mean for each
treatment group as denoted by M_t in (M_t - M_c / Pooled_SD)?

I had in mind the control groups (M_c), although my comment would apply
equally to the treatment groups.
Respectfully,
Luke

On Sun, Oct 3, 2021 at 11:31 AM James Pustejovsky <jepusto at gmail.com>
wrote:
Hi Luke,

Based on your responses, I think the response ratio could be an
appropriate effect measure and further that there could be drawbacks
to using the standardized mean difference. Let me note potential
drawbacks first.

* Variation in the number of possible errors (and perhaps also in the
length of the time provided for the test?) suggests that the measures
from different studies may have varying degrees of reliability.
Varying reliability introduces heterogeneity in the SMD (because the
denominator is inflated or shrunk by the degree of reliability).

* A relationship between the M and SD of the proportions for a given
group suggests that the distribution of the individual-level outcomes
might also exhibit mean-variance relationships. (I say "suggests"
rather than implies because there's an ecological inference here,
i.e., assuming something about individual-level variation on the basis
of group-level variation). If this supposition is reasonable, then
that introduces a further potential source of heterogeneity in the
SMDs (study-to-study variation in the M for the reference group
influences the SD of the reference group, thereby inflating or
shrinking the SMDs).

The response ratio does not have these same concerns because it is a
function of the group means alone. (The standard error of the response
ratio involves the SD of each group, but the effect size metric itself
does not.) Further, you noted that the group means are not too near
the extremes of the scale, so the (log-transformed) response ratio
should be reasonably "well-behaved" in terms of its sampling
distribution.

In light of the above, here's how I might proceed if I were conducting
this analysis:
1. Calculate *both* SMDs and log-transformed response ratios for the
full set of studies.
2. Examine the distribution of effect size estimates for each metric
(using histograms or funnel plots). If one of the distributions is
skewed or has extreme outliers, take that as an indication that the
metric might not be appropriate.
3. Fit meta-analytic models to summarize the distribution of effect
sizes in each metric, using a model that appropriately describes the
dependence structure of the estimates. Calculate I-squared statistics,
give preference to the metric with lower I-squared.
4. If (2) and (3) don't lead to a clearly preferable metric, then
choose between SMD and RR based on whichever will make the synthesis
results easier to explain to people.
5. (Optional/extra credit) Whichever metric you choose, repeat your
main analyses using the other metric and stuff all those results in
supplementary materials, to satisfy any inveterate statistical
curmudgeons who might review/read your synthesis.

James

On Oct 1, 2021, at 12:39 AM, Luke Martinez <martinezlukerm at gmail.com>
wrote:
?Dear James,

Thank you for the insightful comments. Here are my answers inline:

1- Is the total number possible, the same for the groups being
compared within a given study?
Not necessarily.

2- Did some studies use passages with many possible errors to be
corrected while other studies used passages with just a few errors?
Yes, that's correct. Passage characteristics are fully coded for as
potential moderators.

3- Did the difficulty of the passages differ from study to study?
Yes, that's correct. Studies with more advanced students used more
difficult passages.

4- Were there very low or very high mean proportions in any studies?
No, means were never so close to 0 or 1.

5- Does there seem to be a relationship between the means and the
variances of the proportions of a given group?
Assuming you mean the following, yes:

group1_M_prop = c(.39, .18, .13)
group1_SD_prop = c(.25, .16, .13)

plot(group1_M_prop, group1_SD_prop^2)

Thanks,
Luke

On Thu, Sep 30, 2021 at 10:17 PM James Pustejovsky <jepusto at gmail.com>
wrote:
Hi Luke,

To add to Wolfgang's comments, I would suggest that you could also
consider other effect measures besides the SMD. For example, the response
ratio is also a scale-free metric that could work with the proportion
outcomes that you've described, and would also be appropriate for raw
frequency counts as long as the total number possible is the same for the
groups being compared within a given study.
Whether the response ratio would be more appropriate than the SMD is
hard to gauge. One would need to know more about how the proportions were
assessed and how the assessment procedures varied from study to study. For
instance, did some studies use passages with many possible errors to be
corrected while other studies used passages with just a few errors? Did the
difficulty of the passages differ from study to study? Were there very low
or very high mean proportions in any studies? Does there seem to be a
relationship between the means and the variances of the proportions of a
given group?
James

On Thu, Sep 30, 2021 at 2:22 AM Luke Martinez <
martinezlukerm at gmail.com> wrote:
Dear Wolfgang,

Thank you so much for your response and also the references.

I will compute an SMD from the means and sds of all types of
proportions
and the raw counts reported in the papers.

Instead of a moderator, I thought I add a random effect for the
variation
in these types of proportions and raw counts, which will be crossed
with
studies (I think), because true effects can be correlated (?) due to
sharing a study as well as sharing one of these types of proportions
or raw
counts, right?

proportion_type1 = # of corrected items / all items needing
correction
proportion_type2 = # of corrected items / (all items needing
correction + all wrongly corrected items)

raw_counts = # of corrected items

On Thu, Sep 30, 2021, 1:33 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Hi Luke,

Yes, treating the mean proportions as means is ok -- after all,
they are
means. As long as n is not too small (and the true mean proportion
not too
close to 0 or 1), then the CLT will also ensure that the sampling
distribution of a mean proportion is approximately normal.

We have analayzed such mean proportions in these articles:

McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A.
N., &
Leshikar, E. D. (2020). Theories of the generation effect and the
impact of
generation constraint: A meta-analytic review. Psychonomic Bulletin
&
Review, 27(6), 1139-1165.
https://doi.org/10.3758/s13423-020-01762-3
Vachon, H., Viechtbauer, W., Rintala, A., & Myin-Germeys, I. (2019).
Compliance and retention with the experience sampling method over
the
continuum of severe mental disorders: Meta-analysis and
recommendations.
Journal of Medical Internet Research, 21(12), e14475.
https://doi.org/10.2196/14475

In these articles, we did not compute standardized mean differences
based
on the mean proportions, but one could do so.

For the data below:

escalc(measure="SMD", m1i=0.45, m2i=0.17, sd1i=0.17, sd2i=0.11,
n1i=20,
n2i=19)

If I understand you correctly, the second type are means of counts
(i.e.,
there is a count for each subject and for example 4.5 is the mean
of those
counts). Again, while an individual count might have other
distributional
properties (e.g., Poisson or negative binomial), once you take the
mean,
it's a mean and the CLT 'kicks in'. So I would again say: yes, you
can
treat these as 'regular' means and compute SMDs based on them.

For the data below:

escalc(measure="SMD", m1i=4.5, m2i=4.7, sd1i=1.12, sd2i=1.59,
n1i=17,
n2i=18)

I might be inclined to code a moderator that distinguishes these
different
types, to see if there is some systematic difference between them.

Best,
Wolfgang

-----Original Message-----
From: Luke Martinez [mailto:martinezlukerm at gmail.com]
Sent: Thursday, 30 September, 2021 0:32
To: R meta
Cc: Viechtbauer, Wolfgang (SP); James Pustejovsky
Subject: Re: Best choice of effect size

Dear All,

To further clarify, the proportion types (my previous email) are
used
to score each study participant's performance on the text. Then,
each
study reports the "mean" and "sd" of a proportion type for control
and
experimental groups (to then compare them with t-tests and ANOVAs).

For example, a study using proportion_type1 (see my previous email)
can provide the following for effect size calculation:

             Mean    SD     n
group1   0.45      0.17  20
group2   0.17      0.11  19

The same is true for studies that use raw frequencies to score each
study participant's performance on the text. In such studies,
often,
"mean" and "sd" of the  # of corrected items (numerator of the
proportions in my previous email) for control and experimental
groups
(to then compare them with t-tests and ANOVAs).

For example, a study using (raw) # of corrected items can provide
the
following for effect size calculation:

             Mean    SD   n
group1   4.5      1.12  17
group2   4.7      1.59  18

My question is that can I calculate SMD across all such studies
given
their intent is to measure the same thing?

Thank you,
Luke

On Wed, Sep 29, 2021 at 12:12 PM Luke Martinez <
martinezlukerm at gmail.com>
wrote:
Dear All,

I'm doing a meta-analysis where the papers report only "mean" and
"sd"
of some form of proportion and/or "mean" and "sd" of
corresponding raw
frequencies. (For context, the papers ask students to read, find,
and
correct the wrong words in a text.)

By some form of proportion, I mean, some papers report actual
proportions:
proportion_type1 = # of corrected items / all items needing
correction
Some paper report a modified version of proportions:

proportion_type2 = # of corrected items / (all items needing
correction + all wrongly corrected items)

There are other versions of proportions and corresponding raw
frequencies as well. But my question is given that all these
studies
only report "mean" and "sd", can I simply use a SMD effect size?

Many thanks,
Luke

       [[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list
R-sig-meta-analysis at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

[R-meta] Best choice of effect size

Thread (8 messages)