[R-meta] Calculation of p values in selmodel
Thanks for the suggestion of a Bayesian approach, James. I want to avoid priors, if possible, and go as far as I can with the selmodel approaches, for now. And I don't want to move the p-value threshold around, since it's the studies with p>0.05 that are less likely to get published. The 3-parameter selection model, with one step at 0.025, works brilliantly in the simulations when there isn't too much publication bias, including, importantly, when there is none, where it works much better than the PEESE method. Of course, you don't know how much publication bias there is, so it's important to use a method that works across the possible range of none through lots, including 100% failure to publish non-significant effects. That's why it's so disappointing that the 3PSM doesn't work with no non-significant effects. When I looked at the data I showed in my last message, you could get the impression that the problem is simply that selmodel needs at least one non-significant study estimate for each level of the factor Sex in the model. But it isn't so. There are plenty of sims where there are no non-significant estimates for the females and just one for the males. For example, one sim has 11 study estimate consisting of 5 significant females, 5 significant males, and one non-significant male (p=0.58). No problem. So maybe the error message is misleading. For about 5% of the simulations I get the warning message "Error when trying to invert Hessian", but it still produces adjusted point estimate for the fixed effects and tau2, so that's not the problem. The problem is the occasional sim (about 1 in 300, with the current simulation) where the error message "One or more intervals do not contain any observed p-values" is wrong, and where it then crashes out of the list processing. Will -----Original Message----- From: R-sig-meta-analysis <r-sig-meta-analysis-bounces at r-project.org> On Behalf Of James Pustejovsky via R-sig-meta-analysis Sent: Monday, March 18, 2024 5:47 AM To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis at r-project.org> Cc: James Pustejovsky <jepusto at gmail.com> Subject: Re: [R-meta] Calculation of p values in selmodel This is an issue with maximum likelihood estimation of the step function selection model generally (rather than a problem with the software implementation). The step function model assumes that there are different selection probabilities for effect size estimates with p-values that fall into different intervals. For a 3-parameter model, the intervals are [0, .025] and (.025, 1], with the first interval fixed to have selection probability 1 and the second interval having selection probability lambda > 0 (an unknown parameter of the model). If there are no observed ES estimates in the first interval, then the ML estimate of lambda is infinite. If there are no observed ES estimates in the second interval, then the ML estimate of lambda is zero, outside of the parameter space. In some of my work, I've implemented an ad hoc fix for the issue by moving the p-value threshold around so that there are at least three ES estimates in each interval. This isn't based on any principle in particular, although Jack Vevea once suggested to me that this might be the sort of thing an analyst might do just to get the model to converge. A more principled way to fix the issue would be to use penalized likelihood or Bayesian methods with an informative prior on lambda. See the publipha package (https://cran.r-project.org/package=publipha) for one implementation. James
On Sat, Mar 16, 2024 at 10:23?PM Will Hopkins via R-sig-meta-analysis < r-sig-meta-analysis at r-project.org> wrote:
No-one has responded to this issue. It's now causing a problem in my
simulations when I am analyzing for publication bias arising from
deletion of 90% of nonsignificant study estimates and ending up with
small numbers
(10-30) of included studies. See below (and attached as an
easier-to-read text file) for an example. Two of the 14 study
estimates (Row 8 and 9) were non-significant, but the original t value
(tOrig) would have made them significant in selmodel(?, type = "step",
steps = (0.025)). So I processed any observations with non-significant
p values and t>1.96 by replacing the standard error (YdelSE) with
Ydelta/1.95. The resulting new t vslues (tNew) are 1.95 for both those
observations, whereas all the other t values are unchanged. So they
should be non-significant in selmodel, right? But I still get this error message:
Error in selmodel.rma.uni(x, type = "step", steps = (0.025)) :
One or more intervals do not contain any observed p-values (use
'verbose=TRUE' to see which).
I must be doing something idiotic, but what? Help, please!
Oh, and thanks again to Tobias Saueressig for his help with
list-processing of the objects created by rma, selmodel and confint.
My original for-loop approach fell over when the values of the Sim
variable were not consecutive integers (for example, when I had
generated the sims and then deleted any lacking non-significant study
estimates), but separate processing of the lists as suggested by
Tobias worked perfectly. It stops working when it crashes out with the
above error, but hopefully someone will solve that problem.
Will
Sim StudID Sex SSize Ydelta YdelSE
tOrig tNew pValue
<dbl> <dbl> <fct> <dbl> <dbl> <dbl>
<dbl> <dbl> <dbl>
1 448 1 Female 10 3.72
0.684 5.44 5.44 0.000413
2 448 6 Female 10 3.08
0.901 3.42 3.42 0.00766
3 448 11 Female 10 4.49
0.926 4.85 4.85 0.000906
4 448 21 Female 28 4.95
0.777 6.37 6.37 0.000000808
5 448 26 Female 12 3.82
1.25 3.06 3.06 0.0109
6 448 31 Female 22 2.13
0.991 2.15 2.15 0.0433
7 448 36 Female 10 3.27
1.13 2.89 2.89 0.0177
8 448 10 Male 18 4.46
2.29 2.03 1.95 0.0578
9 448 14 Male 10 3.2
1.64 1.98 1.95 0.0795
10 448 17 Male 13 4.32
1.97 2.19 2.19 0.049
11 448 30 Male 10 1.16
0.467 2.48 2.48 0.0348
12 448 38 Male 10 3.61
1.24 2.91 2.91 0.0175
13 448 39 Male 10 2.49
0.828 3.01 3.01 0.0148
14 448 40 Male 28 1.92
0.602 3.19 3.19 0.0036
*From:* Will Hopkins <willthekiwi at gmail.com>
*Sent:* Friday, March 15, 2024 8:39 AM
*To:* 'R Special Interest Group for Meta-Analysis' <
r-sig-meta-analysis at r-project.org>
*Subject:* Calculation of p values in selmodel
According to your documentation, Wolfgang, the selection models in
selmodel are based on the p values of the study estimates, but these
are computed by assuming the study estimate divided by its standard
error has a normal distribution, whereas significance in the original
studies of mean effects of continuous variables would have been based on a t distribution.
It could make a difference when sample sizes in the original studies
are
~10 or so, because some originally non-significant effects would be
treated as significant by selmodel. For example, with a sample size of
10, a mean change has 9 degrees of freedom, so a p value of 0.080
(i.e., non-significant, p>0.05) in the original study will be given a
p value of
0.049 (i.e., significant, p<0.05) by selmodel. Is this issue likely to
make any real difference to the performance of selmodel with
meta-analyses of realistic small-sample studies? I guess that only a
small (negligible?) proportion of p values will fall between 0.05 and
0.08, in the worst-case scenario of a true effect close to the
critical value and with only 9 degrees of freedom for the SE. If it is
an issue, you could include the SE's degrees of freedom in the rma object that gets passed to selmodel.
Will
_______________________________________________ R-sig-meta-analysis mailing list @ R-sig-meta-analysis at r-project.org To manage your subscription to this mailing list, go to: https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
_______________________________________________ R-sig-meta-analysis mailing list @ R-sig-meta-analysis at r-project.org To manage your subscription to this mailing list, go to: https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis