Respectfully,
Luke
On Sun, Oct 3, 2021 at 11:31 AM James Pustejovsky <jepusto at gmail.com>
wrote:
Hi Luke,
Based on your responses, I think the response ratio could be an
appropriate effect measure and further that there could be drawbacks
to using the standardized mean difference. Let me note potential
drawbacks first.
* Variation in the number of possible errors (and perhaps also in the
length of the time provided for the test?) suggests that the measures
from different studies may have varying degrees of reliability.
Varying reliability introduces heterogeneity in the SMD (because the
denominator is inflated or shrunk by the degree of reliability).
* A relationship between the M and SD of the proportions for a given
group suggests that the distribution of the individual-level outcomes
might also exhibit mean-variance relationships. (I say "suggests"
rather than implies because there's an ecological inference here,
i.e., assuming something about individual-level variation on the basis
of group-level variation). If this supposition is reasonable, then
that introduces a further potential source of heterogeneity in the
SMDs (study-to-study variation in the M for the reference group
influences the SD of the reference group, thereby inflating or
shrinking the SMDs).
The response ratio does not have these same concerns because it is a
function of the group means alone. (The standard error of the response
ratio involves the SD of each group, but the effect size metric itself
does not.) Further, you noted that the group means are not too near
the extremes of the scale, so the (log-transformed) response ratio
should be reasonably "well-behaved" in terms of its sampling
distribution.
In light of the above, here's how I might proceed if I were conducting
this analysis:
1. Calculate *both* SMDs and log-transformed response ratios for the
full set of studies.
2. Examine the distribution of effect size estimates for each metric
(using histograms or funnel plots). If one of the distributions is
skewed or has extreme outliers, take that as an indication that the
metric might not be appropriate.
3. Fit meta-analytic models to summarize the distribution of effect
sizes in each metric, using a model that appropriately describes the
dependence structure of the estimates. Calculate I-squared statistics,
give preference to the metric with lower I-squared.
4. If (2) and (3) don't lead to a clearly preferable metric, then
choose between SMD and RR based on whichever will make the synthesis
results easier to explain to people.
5. (Optional/extra credit) Whichever metric you choose, repeat your
main analyses using the other metric and stuff all those results in
supplementary materials, to satisfy any inveterate statistical
curmudgeons who might review/read your synthesis.
James
On Oct 1, 2021, at 12:39 AM, Luke Martinez <martinezlukerm at gmail.com>
Thank you for the insightful comments. Here are my answers inline:
1- Is the total number possible, the same for the groups being
compared within a given study?
2- Did some studies use passages with many possible errors to be
corrected while other studies used passages with just a few errors?
Yes, that's correct. Passage characteristics are fully coded for as
potential moderators.
3- Did the difficulty of the passages differ from study to study?
Yes, that's correct. Studies with more advanced students used more
difficult passages.
4- Were there very low or very high mean proportions in any studies?
No, means were never so close to 0 or 1.
5- Does there seem to be a relationship between the means and the
variances of the proportions of a given group?
Assuming you mean the following, yes:
group1_M_prop = c(.39, .18, .13)
group1_SD_prop = c(.25, .16, .13)
plot(group1_M_prop, group1_SD_prop^2)
Thanks,
Luke
On Thu, Sep 30, 2021 at 10:17 PM James Pustejovsky <jepusto at gmail.com>
Hi Luke,
To add to Wolfgang's comments, I would suggest that you could also
consider other effect measures besides the SMD. For example, the response
ratio is also a scale-free metric that could work with the proportion
outcomes that you've described, and would also be appropriate for raw
frequency counts as long as the total number possible is the same for the
groups being compared within a given study.
Whether the response ratio would be more appropriate than the SMD is
hard to gauge. One would need to know more about how the proportions were
assessed and how the assessment procedures varied from study to study. For
instance, did some studies use passages with many possible errors to be
corrected while other studies used passages with just a few errors? Did the
difficulty of the passages differ from study to study? Were there very low
or very high mean proportions in any studies? Does there seem to be a
relationship between the means and the variances of the proportions of a
given group?
On Thu, Sep 30, 2021 at 2:22 AM Luke Martinez <
martinezlukerm at gmail.com> wrote:
Dear Wolfgang,
Thank you so much for your response and also the references.
I will compute an SMD from the means and sds of all types of
and the raw counts reported in the papers.
Instead of a moderator, I thought I add a random effect for the
in these types of proportions and raw counts, which will be crossed
studies (I think), because true effects can be correlated (?) due to
sharing a study as well as sharing one of these types of proportions
counts, right?
proportion_type1 = # of corrected items / all items needing
proportion_type2 = # of corrected items / (all items needing
correction + all wrongly corrected items)
raw_counts = # of corrected items
On Thu, Sep 30, 2021, 1:33 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Hi Luke,
Yes, treating the mean proportions as means is ok -- after all,
means. As long as n is not too small (and the true mean proportion
close to 0 or 1), then the CLT will also ensure that the sampling
distribution of a mean proportion is approximately normal.
We have analayzed such mean proportions in these articles:
McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A.
Leshikar, E. D. (2020). Theories of the generation effect and the
generation constraint: A meta-analytic review. Psychonomic Bulletin
Review, 27(6), 1139-1165.
Vachon, H., Viechtbauer, W., Rintala, A., & Myin-Germeys, I. (2019).
Compliance and retention with the experience sampling method over
continuum of severe mental disorders: Meta-analysis and
Journal of Medical Internet Research, 21(12), e14475.
https://doi.org/10.2196/14475
In these articles, we did not compute standardized mean differences
on the mean proportions, but one could do so.
For the data below:
escalc(measure="SMD", m1i=0.45, m2i=0.17, sd1i=0.17, sd2i=0.11,
n2i=19)
If I understand you correctly, the second type are means of counts
there is a count for each subject and for example 4.5 is the mean
counts). Again, while an individual count might have other
properties (e.g., Poisson or negative binomial), once you take the
it's a mean and the CLT 'kicks in'. So I would again say: yes, you
treat these as 'regular' means and compute SMDs based on them.
For the data below:
escalc(measure="SMD", m1i=4.5, m2i=4.7, sd1i=1.12, sd2i=1.59,
n2i=18)
I might be inclined to code a moderator that distinguishes these
types, to see if there is some systematic difference between them.
Best,
Wolfgang
-----Original Message-----
From: Luke Martinez [mailto:martinezlukerm at gmail.com]
Sent: Thursday, 30 September, 2021 0:32
To: R meta
Cc: Viechtbauer, Wolfgang (SP); James Pustejovsky
Subject: Re: Best choice of effect size
Dear All,
To further clarify, the proportion types (my previous email) are
to score each study participant's performance on the text. Then,
study reports the "mean" and "sd" of a proportion type for control
experimental groups (to then compare them with t-tests and ANOVAs).
For example, a study using proportion_type1 (see my previous email)
can provide the following for effect size calculation:
Mean SD n
group1 0.45 0.17 20
group2 0.17 0.11 19
The same is true for studies that use raw frequencies to score each
study participant's performance on the text. In such studies,
"mean" and "sd" of the # of corrected items (numerator of the
proportions in my previous email) for control and experimental
(to then compare them with t-tests and ANOVAs).
For example, a study using (raw) # of corrected items can provide
following for effect size calculation:
Mean SD n
group1 4.5 1.12 17
group2 4.7 1.59 18
My question is that can I calculate SMD across all such studies
their intent is to measure the same thing?
Thank you,
Luke
On Wed, Sep 29, 2021 at 12:12 PM Luke Martinez <
martinezlukerm at gmail.com>
Dear All,
I'm doing a meta-analysis where the papers report only "mean" and
of some form of proportion and/or "mean" and "sd" of
frequencies. (For context, the papers ask students to read, find,
correct the wrong words in a text.)
By some form of proportion, I mean, some papers report actual
proportion_type1 = # of corrected items / all items needing
Some paper report a modified version of proportions:
proportion_type2 = # of corrected items / (all items needing
correction + all wrongly corrected items)
There are other versions of proportions and corresponding raw
frequencies as well. But my question is given that all these
only report "mean" and "sd", can I simply use a SMD effect size?
Many thanks,
Luke
[[alternative HTML version deleted]]