An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090205/f253a60c/attachment-0001.pl>
Incorrect p value for binom.test?
4 messages · Michael Grant, Peter Dalgaard, Albyn Jones +1 more
Michael Grant wrote:
I believe the binom.test procedure is producing one tailed p values rather than the two tailed value implied by the alternative hypothesis language. A textbook and SAS both show 2*9.94e-07 = 1.988e-06 as the two tailed value. As does the R summation syntax from R below. It looks to me like the alternative hypothesis language should be revised to something like " ... greater than or equal to ..." Am I mistaken?
Yes. Or maybe, it is a matter of definition. The problem is that > (0:25)[dbinom(0:25,25,.061) <= dbinom(10,25,.061)] [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 so with R's definition of "more extreme", all such values are in the upper tail. Actually, if you look at the actual distribution, I think you'll agree that it is rather difficult to define a lower tail with positive probability that corresponds to X >= 10. > round(dbinom(0:25,25,.061),6) [1] 0.207319 0.336701 0.262476 0.130726 0.046708 0.012744 [7] 0.002760 0.000487 0.000071 0.000009 0.000001 0.000000 [13] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [19] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [25] 0.000000 0.000000 In any case, you would be hard pressed to find a subset of 0:25 that has the probability that SAS and your textbook claims as the p value.
M.C.Grant
2*sum(dbinom(c(10:25),25,0.061))
[1] 1.987976e-06
binom.test(10,25,0.061)
Exact binomial test
data: 10 and 25
number of successes = 10, number of trials = 25, p-value = 9.94e-07
alternative hypothesis: true probability of success is not equal to
0.061
95 percent confidence interval:
0.2112548 0.6133465
sample estimates:
probability of success
0.4
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
The computation 2*sum(dbinom(c(10:25),25,0.061)) does not correspond
to any reasonable definition of p-value. For a symmetric
distribution, it is fine to use 2 times the tail area of one tail.
For an asymetric distribution, this is silly.
The standard definition given in elementary texts is usually somthing like
the probability of observing a test statistic at least as
extreme as the observed value
or more formally as
the smallest significance level at which the observed result would
lead to rejection of the null hypothesis
Either definition requires further decisions (what does "at least as
extreme" mean?). In an asymetric distribution, "at least as far from
E(X|H0)" is not a good interpretation, since deviations in one direction
may be much less probable than deviations in the other direction.
Peter's interpretation corresponds both to the interpretation of "at
least as extreme" as "at least as improbable", and also to the
"smallest significance level" interpretation for the test implemented
in binom.test, ie the Clopper-Pearson "exact" test. 2 times the upper
tail area corresponds to neither. The fact that it is implemented in
SAS and appears in a text do not rescue it from that fundamental
failure to make sense.
albyn
On Thu, Feb 05, 2009 at 09:48:11PM +0100, Peter Dalgaard wrote:
Michael Grant wrote:
I believe the binom.test procedure is producing one tailed p values rather than the two tailed value implied by the alternative hypothesis language. A textbook and SAS both show 2*9.94e-07 = 1.988e-06 as the two tailed value. As does the R summation syntax from R below. It looks to me like the alternative hypothesis language should be revised to something like " ... greater than or equal to ..." Am I mistaken?
Yes. Or maybe, it is a matter of definition. The problem is that
(0:25)[dbinom(0:25,25,.061) <= dbinom(10,25,.061)]
[1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 so with R's definition of "more extreme", all such values are in the upper tail. Actually, if you look at the actual distribution, I think you'll agree that it is rather difficult to define a lower tail with positive probability that corresponds to X >= 10.
round(dbinom(0:25,25,.061),6)
[1] 0.207319 0.336701 0.262476 0.130726 0.046708 0.012744 [7] 0.002760 0.000487 0.000071 0.000009 0.000001 0.000000 [13] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [19] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [25] 0.000000 0.000000 In any case, you would be hard pressed to find a subset of 0:25 that has the probability that SAS and your textbook claims as the p value.
M.C.Grant
2*sum(dbinom(c(10:25),25,0.061))
[1] 1.987976e-06
binom.test(10,25,0.061)
Exact binomial test
data: 10 and 25
number of successes = 10, number of trials = 25, p-value = 9.94e-07
alternative hypothesis: true probability of success is not equal to
0.061
95 percent confidence interval:
0.2112548 0.6133465
sample estimates:
probability of success
0.4
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Thu, 5 Feb 2009, Albyn Jones wrote:
The computation 2*sum(dbinom(c(10:25),25,0.061)) does not correspond to any reasonable definition of p-value. For a symmetric distribution, it is fine to use 2 times the tail area of one tail. For an asymetric distribution, this is silly.
"Silly" is much too strong. There is a perfectly good reason to compare 2*sum(dbinom(c(10:25),25,0.061)) to a two-sided test threshold.
The argument is that what we are really doing in usual two-sided location tests is two one-sided tests at alpha/2 rather than one two-sided test at alpha. The null hypothesis is being compared to two different alternatives (better or worse vs same) and the decisions about the future would be different depending on which tail we ended up using.
This argument says that we we should compare a one-sided tail area such as sum(dbinom(c(10:25),25,0.061)) to alpha/2; equivalently that we should compare 2*sum(dbinom(c(10:25),25,0.061)) to alpha [or to informal standards for strength of evidence or whatever you typically do with p-values]. I'm not saying that this is the only sensible way to handle and interpret p-values in two-sided tests, but I really don't think it can be dismissed as 'silly'.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
PS: Daniel Dennett has described as an occupational hazard for philosophers the tendency to go from "I can't imagine X" to "No one can imagine X" to "X is inconceivable". The transition from "I can't imagine how X would be used" to "X is useless" is somewhat similar, as is the Extreme Bayesian transition from "X wasn't derived by a formal consideration of posterior expected loss" to "X can't be derived by a formal consideration of posterior expected loss" to "X is incoherent". Why, yes, I am grumpy about a reviewer. How did you guess?