Skip to content
Prev 5139 / 5636 Next

[R-meta] Calculation of p values in selmodel

This thread intermingles several issues, so let me add my 2 cents to each one:

This thread started out with Will's criticism that the selection models implemented in selmodel() compute p-values based on a standard Wald-type test of zi = yi / sqrt(vi). This issue was also recently raised in this post:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005098.html

to which I and James provided the following replies:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005099.html
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005100.html

Let me reiterate: p-values are not just computed for the observed effect sizes, but also as part of an additional step that involves integrating over the weighted density of the effect size estimates. The weights are a function of the p-values, which in turn are a function of the integrand. So, if one wants to use p-values that are not computed based on a standard Wald-type test, one doesn't just have to change how the p-values of the observed effect sizes are computed, but also how the p-values in this integration step are computed. And this is not automatically solved by supplying degrees of freedom and then using a t-distribution. What if a study author has used a Mann?Whitney U test instead of a t-test? What if they used a permutation test? The p-values are then not coming from a t-test, so those dfs are irrelevant. For other effect size measures (e.g., odds ratios), the authors may have used a Pearson chi^2 test (with or without a continuity correction) or maybe the odds ratio came from a logistic regression model with additional predictors. If one wants to use the p-values as reported, then the p-values in the integration step would have to be computed in the same manner. Implementing this for all these possibilities is essentially impossible.

Also, if we want to be so precise, then one should not assume that yi is normally distributed, but use the exact distribution for the density and the integration step. So for d it would be a scaled and shifted non-central t-distribution, for r it would be the exact distribution of a Pearson correlation coefficient, for raw proportions it would be a binomial distribution, and so on and so on -- this will be different for every single effect size measure out there. Implementing this again is essentially impossible.

Will also asked: "Is this issue likely to make any real difference to the performance of selmodel with meta-analyses of realistic small-sample studies?" An answer to this depends on many factors (what effect size measure? how many studies? what are their sample sizes? what method was used to compute the actual p-values (could be different for the different studies)? which selection model is being used? what is the true selection function?) and as far as I know, there are no simulation studies addressing this question in the first place, so the answer is: Nobody knows.

Based on this, I did a small simulation study for standardized mean differences and found no noteworthy difference between using Wald-type tests instead of the 'proper' t-tests with k=50 studies and sample sizes between 20 to 60. But there are many factors one could manipulate here so this might not generalize.

Also, as I also mentioned before, this concern is probably a relatively minor issue in comparison to the fact that selection models are really rough approximations to a much more complex data generating mechanism anyway. The moment some study authors are deviating in their behavior from the assumed selection model (which is guaranteed to be case), then this is likely to have a much greater impact on the performance of selection models.

However, based on this discussion, I have now implemented the possibility to supply p-values directly to selmodel() via the *undocumented* 'pval' argument. To illustrate with Will's dataset:

library(metafor)

dat <- structure(list(Sim = c(448L, 448L, 448L, 448L, 448L, 448L, 448L, 448L,
448L, 448L, 448L, 448L, 448L, 448L), StudID = c(1L, 6L, 11L, 21L, 26L, 31L,
36L, 10L, 14L, 17L, 30L, 38L, 39L, 40L), Sex = c("Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male"), SSize = c(10L, 10L, 10L, 28L, 12L, 22L, 10L, 18L,
10L, 13L, 10L, 10L, 10L, 28L), Ydelta = c(3.72, 3.08, 4.49, 4.95, 3.82, 2.13,
3.27, 4.46, 3.2, 4.32, 1.16, 3.61, 2.49, 1.92), YdelSE = c(0.684, 0.901,
0.926, 0.777, 1.25, 0.991, 1.13, 2.29, 1.64, 1.97, 0.467, 1.24, 0.828, 0.602),
tOrig = c(5.44, 3.42, 4.85, 6.37, 3.06, 2.15, 2.89, 2.03, 1.98, 2.19, 2.48,
2.91, 3.01, 3.19), tNew = c(5.44, 3.42, 4.85, 6.37, 3.06, 2.15, 2.89, 1.95,
1.95, 2.19, 2.48, 2.91, 3.01, 3.19), pValue = c(0.000413, 0.00766, 0.000906,
8.08e-07, 0.0109, 0.0433, 0.0177, 0.0578, 0.0795, 0.049, 0.0348, 0.0175,
0.0148, 0.0036)), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14"))

res <- rma(Ydelta, sei=YdelSE, data=dat, method="ML")
res

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, 1))
sel

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, 1), pval=dat$pValue)
sel

To be clear, the p-values in the integration step are still computed based on the Wald-type test and this still assumes normally distributed effect size estimates. The impact of doing this are unclear and this might be horribly wrong. Use at your own discretion.

The other issue that was raised is empty intervals in step-function selection models. This is really a separate issue (even if it arose in one particular example due to different methods being used in the studies themselves and the selection model for computing the p-values). James nicely explained why empty intervals cause problems for ML estimation. In connection with this, I will illustrate this with an example (and some features of the selmodel() function):

library(metafor)

dat <- dat.hackshaw1998

res <- rma(yi, vi, data=dat, method="ML")
res

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, .05, 1), verbose=TRUE)

Another undocumented argument that was already implemented is the possibility to skip the interval check:

sel <- selmodel(res, type="stepfun", alternative="greater",
                steps=c(.025, .05, 1), skipintcheck=TRUE)
sel

As one can see, the estimate of delta[2] drifts to 0 (which then causes issues with inverting the Hessian, which is yet another separate issue).

To illustrate the opposite:

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.001, 1))
sel

Now delta[2] wants to drift to infinity, which it can't due to 100 being the default limit for the delta values (bounds are imposed for numerical stability issues). This can be adjusted:

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.001, 1),
                control=list(delta.max=Inf))
sel

It doesn't quite get to infinity, but close enough I would say.

I had considered the option to automatically collapse empty intervals (or at least, optionally), but decided not to do this. I leave it up to the user to implement such steps if they like.

Best,
Wolfgang