Incorrect p value for binom.test?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090205/f253a60c/attachment-0001.pl>
I believe the binom.test procedure is producing one tailed p values
rather than the two tailed value implied by the alternative hypothesis
language.  A textbook and SAS both show 2*9.94e-07 = 1.988e-06 as the
two tailed value.  As does the R summation syntax from R below.  It
looks to me like the alternative hypothesis language should be revised
to something like " ... greater than or equal to ..."  Am I mistaken?
Yes. Or maybe, it is a matter of definition. The problem is that

 > (0:25)[dbinom(0:25,25,.061) <= dbinom(10,25,.061)]
  [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

so with R's definition of "more extreme", all such values are in the 
upper tail.

Actually, if you look at the actual distribution, I think you'll agree 
that it is rather difficult to define a lower tail with positive 
probability that corresponds to X >= 10.

 > round(dbinom(0:25,25,.061),6)
  [1] 0.207319 0.336701 0.262476 0.130726 0.046708 0.012744
  [7] 0.002760 0.000487 0.000071 0.000009 0.000001 0.000000
[13] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[19] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[25] 0.000000 0.000000

In any case, you would be hard pressed to find a subset of 0:25 that has 
the probability that SAS and your textbook claims as the p value.

M.C.Grant

2*sum(dbinom(c(10:25),25,0.061))
[1] 1.987976e-06

binom.test(10,25,0.061)

        Exact binomial test

data:  10 and 25 

number of successes = 10, number of trials = 25, p-value = 9.94e-07

alternative hypothesis: true probability of success is not equal to
0.061 

95 percent confidence interval:

 0.2112548 0.6133465 

sample estimates:

probability of success 

                   0.4

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
The computation 2*sum(dbinom(c(10:25),25,0.061)) does not correspond
to any reasonable definition of p-value.  For a symmetric
distribution, it is fine to use 2 times the tail area of one tail.
For an asymetric distribution, this is silly.

The standard definition given in elementary texts is usually somthing like

     the probability of observing a test statistic at least as 
      extreme as the observed value

or more formally as 

      the smallest significance level at which the observed result would
      lead to rejection of the null hypothesis

Either definition requires further decisions (what does "at least as
extreme" mean?).  In an asymetric distribution, "at least as far from
E(X|H0)"  is not a good interpretation, since deviations in one direction 
may be much less probable than deviations in the other direction.  

Peter's interpretation corresponds both to the interpretation of "at
least as extreme" as "at least as improbable", and also to the
"smallest significance level" interpretation for the test implemented
in binom.test, ie the Clopper-Pearson "exact" test.  2 times the upper
tail area corresponds to neither.  The fact that it is implemented in
SAS and appears in a text do not rescue it from that fundamental
failure to make sense.

albyn
Michael Grant wrote:
I believe the binom.test procedure is producing one tailed p values
rather than the two tailed value implied by the alternative hypothesis
language.  A textbook and SAS both show 2*9.94e-07 = 1.988e-06 as the
two tailed value.  As does the R summation syntax from R below.  It
looks to me like the alternative hypothesis language should be revised
to something like " ... greater than or equal to ..."  Am I mistaken?

Yes. Or maybe, it is a matter of definition. The problem is that

(0:25)[dbinom(0:25,25,.061) <= dbinom(10,25,.061)]
 [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

so with R's definition of "more extreme", all such values are in the upper 
tail.

Actually, if you look at the actual distribution, I think you'll agree that 
it is rather difficult to define a lower tail with positive probability 
that corresponds to X >= 10.

round(dbinom(0:25,25,.061),6)
 [1] 0.207319 0.336701 0.262476 0.130726 0.046708 0.012744
 [7] 0.002760 0.000487 0.000071 0.000009 0.000001 0.000000
[13] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[19] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[25] 0.000000 0.000000

In any case, you would be hard pressed to find a subset of 0:25 that has 
the probability that SAS and your textbook claims as the p value.

M.C.Grant

2*sum(dbinom(c(10:25),25,0.061))
[1] 1.987976e-06

binom.test(10,25,0.061)

        Exact binomial test

data:  10 and 25 
number of successes = 10, number of trials = 25, p-value = 9.94e-07

alternative hypothesis: true probability of success is not equal to
0.061 
95 percent confidence interval:

 0.2112548 0.6133465 
sample estimates:

probability of success 
                   0.4

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

The computation 2*sum(dbinom(c(10:25),25,0.061)) does not correspond
to any reasonable definition of p-value.  For a symmetric
distribution, it is fine to use 2 times the tail area of one tail.
For an asymetric distribution, this is silly.

"Silly" is much too strong. There is a perfectly good reason to compare 2*sum(dbinom(c(10:25),25,0.061)) to a two-sided test threshold.

The argument is that what we are really doing in usual two-sided location tests is two one-sided tests at alpha/2 rather than one two-sided test at alpha. The null hypothesis is being compared to two different alternatives (better or worse vs same) and the decisions about the future would be different depending on which tail we ended up using.

This argument says that we we should compare a one-sided tail area such as sum(dbinom(c(10:25),25,0.061)) to alpha/2; equivalently that we should compare 2*sum(dbinom(c(10:25),25,0.061)) to alpha [or to informal standards for strength of evidence or whatever you typically do with p-values]. I'm not saying that this is the only sensible way to handle and interpret p-values in two-sided tests, but I really don't think it can be dismissed as 'silly'.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

PS: Daniel Dennett has described as an occupational hazard for philosophers the tendency to go from "I can't imagine X" to "No one can imagine X" to "X is inconceivable".  The transition from "I can't imagine how X would be used" to "X is useless" is somewhat similar, as is the Extreme Bayesian transition from "X wasn't derived by a formal consideration of posterior expected loss" to "X can't be derived by a formal consideration of posterior expected loss" to "X is incoherent". Why, yes, I am grumpy about a reviewer. How did you guess?