[R-meta] Selection models from *reported p-values* - R-SIG-meta-analysis

Tue, Mar 5, 2024 4:09 AM #

Dear R meta-analysis community,

I have a question with regards to selection models based on p-values.

Is it possible to do the selection model based on reported p-values directly rather than the p-values calculated from the effect size and SE?

In many cases, meta-analyses require transformations, or sometimes corrections. However, if we assume that there is a selection process in publishing papers that is based on the p-values, it would make more sense to consider the p-values that are reported in the papers, would it not?.

How would one proceed to do this? I believe the selmodel() function in metafor works with objects fitted with the rma() function, therefore, the p-values are re-calculated only from the effect size and SE.
Assuming I have the reported p-values (detailed up to three decimals) of all the studies included in my meta-analysis, is it possible to test for the selection of studies based on reported p-values and then correct the effect size?

I hope my question makes sense,

Thank you for your help,

Yashvin Seetahul

Wolfgang Viechtbauer

Tue, Mar 5, 2024 5:16 AM #

Dear Yashvin,

I haven't thought this all the way through, but the problem with this is that p-values enter the model in two different ways. There are indeed the actually observed p-values of the studies, but in the integration step (which is needed to compute the log likelihood), we also need to compute p-values. Those are not fixed, but arise from integrating over the density (assumed to be normal) of the effect size estimates. These p-values (which then enter the weight function) are computed as a function of y/sqrt(vi). If we use one way of computing the observed p-values and a different way of computing the p-values in this integration step, then there is a bit of a mismatch and I am not sure about the consequences of that. So for consistency, one should then also compute the p-values in the integration step in a corresponding manner, but this would be very case/measure/test specific and trying to fine-tune this for every specific measure and way of testing it becomes extremely difficult implementation-wise.

We can see a bit of this in Iyengar and Greenhouse (1988) where the weight function is based on a t instead of a normal distribution (analogous to a z- versus a t-test). But this leads to the extra headache inducing complexities in their appendix. I (and others) decided to avoid all of this by making the simplifying assumption that the p-values are always computed based on Wald-type tests of the form 'estimate / SE'.

This should not be too far off in many cases, especially if the sample sizes within studies are not small. For example, the difference between pnorm(2, lower.tail=FALSE) and pt(2, df=100, lower.tail=FALSE) makes very little practical difference. Also, selection models are really rough approximations to a much more complex data generating mechanism anyway, so trying to fine-tune this part of the model is like taking a ruler to align something to millimeter accuracy before taking a sledge hammer to smash it.

A bit like the bias correction for d-values. Whether you put d=0.53 or g=0.52 into your model makes so little difference compared to all the other inaccuracies and infidelities we accept in putting together our meta-analytic datasets in the first place.

But those are just my two cents.

Best,
Wolfgang

James Pustejovsky

Tue, Mar 5, 2024 8:55 AM #

Yashvin,

This is an interesting question, which highlights a potential limitation of
existing meta-analytic selection models (at least those that I'm aware of).

Just to add a thought to Wolfgang's response: the reason that it would be
difficult to modify existing selection models to work with observed
p-values is that current models assume that the p-value is a direct
function of the effect size estimate and its standard error, and the effect
size estimates are the _outcomes_ in the model. So the model implies a
_distribution_ of p-values based on the data-generating process, and we
need to know what that distribution is. In particular, to work with an
observed p-value, we would need to know how the observed p-value is
functionally related to the effect size estimate, and this will depend on
lots of details about the effect size metric, study design, and analytic
methods (your method of calculating the effect size estimate and the
authors' method of calculating p-values).

For some types of transformations, I think the discrepancies will be quite
small.
* For example, say that the author reports a p-value for an untransformed
correlation coefficient, but you meta-analyze the results based on Fisher
z-transformation. For r near zero, the SE of the untransformed coefficient
will be quite close to the SE of the z-transformed coefficient, so using
one or the other will not make much difference at all.
* For another example, say that you do a multiplicative reliability
correction to a correlation coefficient. In this case, the SE of the
corrected coefficient should also be multiplied by the reliability
correction (that is, if we're treating the correction as a fixed constant),
and so the ratio of the corrected correlation to the corrected SE will be
the same as the ratio of the uncorrected correlation to the uncorrected SE,
and the p-value should be the same in both cases.

Finally, here's a potentially more problematic/controversial
counter-example. Say that you are meta-analyzing standardized mean
differences from randomized experiments with pre-test and post-test data,
and for sake of uniformity you are using a difference-in-differences
estimate for the numerator. But some of the primary studies use ANCOVA for
their analysis, so your ES estimate and SE and p-value will differ from
those based on the analysis reported in the primary study. Your analysis is
less precise than the primary study analysis, so your p-value will tend to
be larger than the primary study p-value. Further, maybe you are making an
assumption about the pre/post correlation rather than using the primary
study data to infer it, and this will introduce a further discrepancy.
Personally, I don't have a sense of how big a discrepancy in p-values you
can get in this situation. I think it's an interesting question that would
be worth looking into (and maybe carrying it through to investigating the
implications for the performance of meta-analytic selection models). But
pragmatically, the discrepancy could be resolved by using the information
from the primary analytic approach (ANCOVA) to calculate the effect size
estimate and its standard error, at least to the extent that this is
possible given the statistics reported in the primary study.

Best,
James

On Tue, Mar 5, 2024 at 7:17?AM Viechtbauer, Wolfgang (NP) via

R-sig-meta-analysis <r-sig-meta-analysis at r-project.org> wrote:

Dear Yashvin,

I haven't thought this all the way through, but the problem with this is
that p-values enter the model in two different ways. There are indeed the
actually observed p-values of the studies, but in the integration step
(which is needed to compute the log likelihood), we also need to compute
p-values. Those are not fixed, but arise from integrating over the density
(assumed to be normal) of the effect size estimates. These p-values (which
then enter the weight function) are computed as a function of y/sqrt(vi).
If we use one way of computing the observed p-values and a different way of
computing the p-values in this integration step, then there is a bit of a
mismatch and I am not sure about the consequences of that. So for
consistency, one should then also compute the p-values in the integration
step in a corresponding manner, but this would be very case/measure/test
specific and trying to fine-tune this for every specific measure and way of
testing it becomes extremely difficult imple
 mentation-wise.

We can see a bit of this in Iyengar and Greenhouse (1988) where the weight
function is based on a t instead of a normal distribution (analogous to a
z- versus a t-test). But this leads to the extra headache inducing
complexities in their appendix. I (and others) decided to avoid all of this
by making the simplifying assumption that the p-values are always computed
based on Wald-type tests of the form 'estimate / SE'.

This should not be too far off in many cases, especially if the sample
sizes within studies are not small. For example, the difference between
pnorm(2, lower.tail=FALSE) and pt(2, df=100, lower.tail=FALSE) makes very
little practical difference. Also, selection models are really rough
approximations to a much more complex data generating mechanism anyway, so
trying to fine-tune this part of the model is like taking a ruler to align
something to millimeter accuracy before taking a sledge hammer to smash it.

A bit like the bias correction for d-values. Whether you put d=0.53 or
g=0.52 into your model makes so little difference compared to all the other
inaccuracies and infidelities we accept in putting together our
meta-analytic datasets in the first place.

But those are just my two cents.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis <mailman-bounces at stat.ethz.ch> On Behalf Of

Seetahul,

Yashvin
Sent: Tuesday, March 5, 2024 13:09
To: r-sig-meta-analysis at r-project.org
Cc: r-sig-meta-analysis-owner at r-project.org
Subject: Selection models from *reported p-values*

Dear R meta-analysis community,

I have a question with regards to selection models based on p-values.

Is it possible to do the selection model based on reported p-values

directly

rather than the p-values calculated from the effect size and SE?

In many cases, meta-analyses require transformations, or sometimes

corrections.

However, if we assume that there is a selection process in publishing

papers

that is based on the p-values, it would make more sense to consider the

p-values

that are reported in the papers, would it not?.

How would one proceed to do this? I believe the selmodel() function in

metafor

works with objects fitted with the rma() function, therefore, the

p-values are

re-calculated only from the effect size and SE. Assuming I have the

reported p-

values (detailed up to three decimals) of all the studies included in my

meta-

analysis, is it possible to test for the selection of studies based on

reported

p-values and then correct the effect size?

I hope my question makes sense,

Thank you for your help,

Yashvin Seetahul

_______________________________________________
R-sig-meta-analysis mailing list @ R-sig-meta-analysis at r-project.org
To manage your subscription to this mailing list, go to:
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis