This thread intermingles several issues, so let me add my 2 cents to each one:
This thread started out with Will's criticism that the selection models implemented in selmodel() compute p-values based on a standard Wald-type test of zi = yi / sqrt(vi). This issue was also recently raised in this post:
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005098.html
to which I and James provided the following replies:
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005099.htmlhttps://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005100.html
Let me reiterate: p-values are not just computed for the observed effect sizes, but also as part of an additional step that involves integrating over the weighted density of the effect size estimates. The weights are a function of the p-values, which in turn are a function of the integrand. So, if one wants to use p-values that are not computed based on a standard Wald-type test, one doesn't just have to change how the p-values of the observed effect sizes are computed, but also how the p-values in this integration step are computed. And this is not automatically solved by supplying degrees of freedom and then using a t-distribution. What if a study author has used a Mann?Whitney U test instead of a t-test? What if they used a permutation test? The p-values are then not coming from a t-test, so those dfs are irrelevant. For other effect size measures (e.g., odds ratios), the authors may have used a Pearson chi^2 test (with or without a continuity correction) or maybe the odds ratio came from a logistic regression model with additional predictors. If one wants to use the p-values as reported, then the p-values in the integration step would have to be computed in the same manner. Implementing this for all these possibilities is essentially impossible.
Also, if we want to be so precise, then one should not assume that yi is normally distributed, but use the exact distribution for the density and the integration step. So for d it would be a scaled and shifted non-central t-distribution, for r it would be the exact distribution of a Pearson correlation coefficient, for raw proportions it would be a binomial distribution, and so on and so on -- this will be different for every single effect size measure out there. Implementing this again is essentially impossible.
Will also asked: "Is this issue likely to make any real difference to the performance of selmodel with meta-analyses of realistic small-sample studies?" An answer to this depends on many factors (what effect size measure? how many studies? what are their sample sizes? what method was used to compute the actual p-values (could be different for the different studies)? which selection model is being used? what is the true selection function?) and as far as I know, there are no simulation studies addressing this question in the first place, so the answer is: Nobody knows.
Based on this, I did a small simulation study for standardized mean differences and found no noteworthy difference between using Wald-type tests instead of the 'proper' t-tests with k=50 studies and sample sizes between 20 to 60. But there are many factors one could manipulate here so this might not generalize.
Also, as I also mentioned before, this concern is probably a relatively minor issue in comparison to the fact that selection models are really rough approximations to a much more complex data generating mechanism anyway. The moment some study authors are deviating in their behavior from the assumed selection model (which is guaranteed to be case), then this is likely to have a much greater impact on the performance of selection models.
However, based on this discussion, I have now implemented the possibility to supply p-values directly to selmodel() via the *undocumented* 'pval' argument. To illustrate with Will's dataset:
library(metafor)
dat <- structure(list(Sim = c(448L, 448L, 448L, 448L, 448L, 448L, 448L, 448L,
448L, 448L, 448L, 448L, 448L, 448L), StudID = c(1L, 6L, 11L, 21L, 26L, 31L,
36L, 10L, 14L, 17L, 30L, 38L, 39L, 40L), Sex = c("Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male"), SSize = c(10L, 10L, 10L, 28L, 12L, 22L, 10L, 18L,
10L, 13L, 10L, 10L, 10L, 28L), Ydelta = c(3.72, 3.08, 4.49, 4.95, 3.82, 2.13,
3.27, 4.46, 3.2, 4.32, 1.16, 3.61, 2.49, 1.92), YdelSE = c(0.684, 0.901,
0.926, 0.777, 1.25, 0.991, 1.13, 2.29, 1.64, 1.97, 0.467, 1.24, 0.828, 0.602),
tOrig = c(5.44, 3.42, 4.85, 6.37, 3.06, 2.15, 2.89, 2.03, 1.98, 2.19, 2.48,
2.91, 3.01, 3.19), tNew = c(5.44, 3.42, 4.85, 6.37, 3.06, 2.15, 2.89, 1.95,
1.95, 2.19, 2.48, 2.91, 3.01, 3.19), pValue = c(0.000413, 0.00766, 0.000906,
8.08e-07, 0.0109, 0.0433, 0.0177, 0.0578, 0.0795, 0.049, 0.0348, 0.0175,
0.0148, 0.0036)), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14"))
res <- rma(Ydelta, sei=YdelSE, data=dat, method="ML")
res
sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, 1))
sel
sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, 1), pval=dat$pValue)
sel
To be clear, the p-values in the integration step are still computed based on the Wald-type test and this still assumes normally distributed effect size estimates. The impact of doing this are unclear and this might be horribly wrong. Use at your own discretion.
The other issue that was raised is empty intervals in step-function selection models. This is really a separate issue (even if it arose in one particular example due to different methods being used in the studies themselves and the selection model for computing the p-values). James nicely explained why empty intervals cause problems for ML estimation. In connection with this, I will illustrate this with an example (and some features of the selmodel() function):
library(metafor)
dat <- dat.hackshaw1998
res <- rma(yi, vi, data=dat, method="ML")
res
sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, .05, 1), verbose=TRUE)
Another undocumented argument that was already implemented is the possibility to skip the interval check:
sel <- selmodel(res, type="stepfun", alternative="greater",
steps=c(.025, .05, 1), skipintcheck=TRUE)
sel
As one can see, the estimate of delta[2] drifts to 0 (which then causes issues with inverting the Hessian, which is yet another separate issue).
To illustrate the opposite:
sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.001, 1))
sel
Now delta[2] wants to drift to infinity, which it can't due to 100 being the default limit for the delta values (bounds are imposed for numerical stability issues). This can be adjusted:
sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.001, 1),
control=list(delta.max=Inf))
sel
It doesn't quite get to infinity, but close enough I would say.
I had considered the option to automatically collapse empty intervals (or at least, optionally), but decided not to do this. I leave it up to the user to implement such steps if they like.
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis <r-sig-meta-analysis-bounces at r-project.org> On Behalf
Of Will Hopkins via R-sig-meta-analysis
Sent: Sunday, March 17, 2024 21:06
To: 'R Special Interest Group for Meta-Analysis' <r-sig-meta-analysis at r-
project.org>
Cc: Will Hopkins <willthekiwi at gmail.com>
Subject: Re: [R-meta] Calculation of p values in selmodel
Thanks for the suggestion of a Bayesian approach, James. I want to avoid priors,
if possible, and go as far as I can with the selmodel approaches, for now. And I
don't want to move the p-value threshold around, since it's the studies with
p>0.05 that are less likely to get published. The 3-parameter selection model,
with one step at 0.025, works brilliantly in the simulations when there isn't
too much publication bias, including, importantly, when there is none, where it
works much better than the PEESE method. Of course, you don't know how much
publication bias there is, so it's important to use a method that works across
the possible range of none through lots, including 100% failure to publish non-
significant effects. That's why it's so disappointing that the 3PSM doesn't work
with no non-significant effects.
When I looked at the data I showed in my last message, you could get the
impression that the problem is simply that selmodel needs at least one non-
significant study estimate for each level of the factor Sex in the model. But it
isn't so. There are plenty of sims where there are no non-significant estimates
for the females and just one for the males. For example, one sim has 11 study
estimate consisting of 5 significant females, 5 significant males, and one non-
significant male (p=0.58). No problem. So maybe the error message is misleading.
For about 5% of the simulations I get the warning message "Error when trying to
invert Hessian", but it still produces adjusted point estimate for the fixed
effects and tau2, so that's not the problem. The problem is the occasional sim
(about 1 in 300, with the current simulation) where the error message "One or
more intervals do not contain any observed p-values" is wrong, and where it then
crashes out of the list processing.
Will
-----Original Message-----
From: R-sig-meta-analysis <r-sig-meta-analysis-bounces at r-project.org> On Behalf
Of James Pustejovsky via R-sig-meta-analysis
Sent: Monday, March 18, 2024 5:47 AM
To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis at r-
project.org>
Cc: James Pustejovsky <jepusto at gmail.com>
Subject: Re: [R-meta] Calculation of p values in selmodel
This is an issue with maximum likelihood estimation of the step function
selection model generally (rather than a problem with the software
implementation).
The step function model assumes that there are different selection probabilities
for effect size estimates with p-values that fall into different intervals. For
a 3-parameter model, the intervals are [0, .025] and (.025, 1], with the first
interval fixed to have selection probability
1 and the second interval having selection probability lambda > 0 (an unknown
parameter of the model). If there are no observed ES estimates in the first
interval, then the ML estimate of lambda is infinite. If there are no observed
ES estimates in the second interval, then the ML estimate of lambda is zero,
outside of the parameter space.
In some of my work, I've implemented an ad hoc fix for the issue by moving the
p-value threshold around so that there are at least three ES estimates in each
interval. This isn't based on any principle in particular, although Jack Vevea
once suggested to me that this might be the sort of thing an analyst might do
just to get the model to converge.
A more principled way to fix the issue would be to use penalized likelihood or
Bayesian methods with an informative prior on lambda. See the publipha package
(https://cran.r-project.org/package=publipha) for one implementation.
James
On Sat, Mar 16, 2024 at 10:23?PM Will Hopkins via R-sig-meta-analysis < r-sig-
meta-analysis at r-project.org> wrote:
No-one has responded to this issue. It's now causing a problem in my
simulations when I am analyzing for publication bias arising from
deletion of 90% of nonsignificant study estimates and ending up with
small numbers
(10-30) of included studies. See below (and attached as an
easier-to-read text file) for an example. Two of the 14 study
estimates (Row 8 and 9) were non-significant, but the original t value
(tOrig) would have made them significant in selmodel(?, type = "step",
steps = (0.025)). So I processed any observations with non-significant
p values and t>1.96 by replacing the standard error (YdelSE) with
Ydelta/1.95. The resulting new t vslues (tNew) are 1.95 for both those
observations, whereas all the other t values are unchanged. So they
should be non-significant in selmodel, right? But I still get this error
message:
Error in selmodel.rma.uni(x, type = "step", steps = (0.025)) :
One or more intervals do not contain any observed p-values (use
'verbose=TRUE' to see which).
I must be doing something idiotic, but what? Help, please!
Oh, and thanks again to Tobias Saueressig for his help with
list-processing of the objects created by rma, selmodel and confint.
My original for-loop approach fell over when the values of the Sim
variable were not consecutive integers (for example, when I had
generated the sims and then deleted any lacking non-significant study
estimates), but separate processing of the lists as suggested by
Tobias worked perfectly. It stops working when it crashes out with the
above error, but hopefully someone will solve that problem.
Will
*From:* Will Hopkins <willthekiwi at gmail.com>
*Sent:* Friday, March 15, 2024 8:39 AM
*To:* 'R Special Interest Group for Meta-Analysis' <
r-sig-meta-analysis at r-project.org>
*Subject:* Calculation of p values in selmodel
According to your documentation, Wolfgang, the selection models in
selmodel are based on the p values of the study estimates, but these
are computed by assuming the study estimate divided by its standard
error has a normal distribution, whereas significance in the original
studies of mean effects of continuous variables would have been based on a t
distribution.
It could make a difference when sample sizes in the original studies
are
~10 or so, because some originally non-significant effects would be
treated as significant by selmodel. For example, with a sample size of
10, a mean change has 9 degrees of freedom, so a p value of 0.080
(i.e., non-significant, p>0.05) in the original study will be given a
p value of
0.049 (i.e., significant, p<0.05) by selmodel. Is this issue likely to
make any real difference to the performance of selmodel with
meta-analyses of realistic small-sample studies? I guess that only a
small (negligible?) proportion of p values will fall between 0.05 and
0.08, in the worst-case scenario of a true effect close to the
critical value and with only 9 degrees of freedom for the SE. If it is
an issue, you could include the SE's degrees of freedom in the rma object that