pbinom with size argument 0 (PR#8560)

7 messages · uht@dfu.min.dk, (Ted Harding), Peter Dalgaard +2 more

Original

1

7

Fri, Feb 3, 2006 3:55 AM #

Full_Name: Uffe H?gsbro Thygesen
Version: 2.2.0
OS: linux
Submission from: (NULL) (130.226.135.250)


Hello all.

  pbinom(q=0,size=0,prob=0.5)

returns the value NaN. I had expected the result 1. In fact any value for q
seems to give an NaN. Note that

  dbinom(x=0,size=0,prob=0.5)

returns the value 1.

Cheers,

Uffe

Fri, Feb 3, 2006 6:34 AM #

On 03-Feb-06 uht at dfu.min.dk wrote:

Well, "NaN" can make sense since "q=0" refers to a single sampled
value, and there is no value which you can sample from "size=0";
i.e. sampling from "size=0" is a non-event. I think the probability
of a non-event should be NaN, not 1! (But maybe others might argue
that if you try to sample from an empty urn you necessarily get
zero "successes", so p should be 1; but I would counter that you
also necessarily get zero "failures" so q should be 1. I suppose
it may be a matter of whether you regard the "r" of the binomial
distribution as referring to the "identities" of the outcomes
rather than to how many you get of a particular type. Hmmm.)

That is probably because the .Internal code for pbinom may do
a preliminary test for "x >= size". This also makes sense, for
the cumulative p<dist> for any <dist> with a finite range,
since the answer must then be 1 and a lot of computation would
be saved (likewise returning 0 when x < 0). However, it would
make even more sense to have a preceding test for "size<=0"
and return NaN in that case since, for the same reasons as
above, the result is the probability of a non-event.

(But it depends on your point of view, as above ... However,
surely the two  should be consistent with each other.)

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Feb-06                                       Time: 14:34:28
------------------------------ XFMail ------------------------------

Fri, Feb 3, 2006 6:47 AM #

(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:

Once you get your coffee, you'll likely realize that you got your p's
and d's mixed up...

I think Uffe is perfectly right: The result of zero experiments will
be zero successes (and zero failures) with probability 1, so the
cumulative distribution function is a step function with one step at
zero ( == as.numeric(x>=0) ).

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Fri, Feb 3, 2006 7:07 AM #

On 03-Feb-06 Peter Dalgaard wrote:

You're right about the mix-up! (I must mend the pipeline.)

I'm perfectly happy with this argument so long as it leads to
dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1
(which seems to be what you are arguing too). And I think there
are no traps if p=0 or p=1.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Feb-06                                       Time: 15:07:49
------------------------------ XFMail ------------------------------

1 day later

Peter Ehlers

Sat, Feb 4, 2006 2:04 PM #

(Ted Harding) wrote:

On 03-Feb-06 Peter Dalgaard wrote:

(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:

On 03-Feb-06 uht at dfu.min.dk wrote:

Full_Name: Uffe H?gsbro Thygesen
Version: 2.2.0
OS: linux
Submission from: (NULL) (130.226.135.250)


Hello all.

 pbinom(q=0,size=0,prob=0.5)

returns the value NaN. I had expected the result 1. In fact any
value for q seems to give an NaN.

Well, "NaN" can make sense since "q=0" refers to a single sampled
value, and there is no value which you can sample from "size=0";
i.e. sampling from "size=0" is a non-event. I think the probability
of a non-event should be NaN, not 1! (But maybe others might argue
that if you try to sample from an empty urn you necessarily get
zero "successes", so p should be 1; but I would counter that you
also necessarily get zero "failures" so q should be 1. I suppose
it may be a matter of whether you regard the "r" of the binomial
distribution as referring to the "identities" of the outcomes
rather than to how many you get of a particular type. Hmmm.)

Note that

 dbinom(x=0,size=0,prob=0.5)

returns the value 1.

That is probably because the .Internal code for pbinom may do
a preliminary test for "x >= size". This also makes sense, for
the cumulative p<dist> for any <dist> with a finite range,
since the answer must then be 1 and a lot of computation would
be saved (likewise returning 0 when x < 0). However, it would
make even more sense to have a preceding test for "size<=0"
and return NaN in that case since, for the same reasons as
above, the result is the probability of a non-event.

Once you get your coffee, you'll likely realize that you got
your p's and d's mixed up...


You're right about the mix-up! (I must mend the pipeline.)

I think Uffe is perfectly right: The result of zero experiments will
be zero successes (and zero failures) with probability 1, so the
cumulative distribution function is a step function with one step at
zero ( == as.numeric(x>=0) ).


I'm perfectly happy with this argument so long as it leads to
dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1
(which seems to be what you are arguing too). And I think there
are no traps if p=0 or p=1.

(But it depends on your point of view, as above ... However,
surely the two  should be consistent with each other.)


Ted.

I prefer a (consistent) NaN. What happens to our notion of a
Binomial RV as a sequence of Bernoulli RVs if we permit n=0?
I have never seen (nor contemplated, I confess) the definition
of a Bernoulli RV as anything other than some dichotomous-outcome
one-trial random experiment. Not n trials, where n might equal zero,
but _one_ trial. I can't see what would be gained by permitting a
zero-trial experiment. If we assign probability 1 to each outcome,
we have a problem with the sum of the probabilities.

Peter Ehlers

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Sat, Feb 4, 2006 4:33 PM #

P Ehlers <ehlers at math.ucalgary.ca> writes:

What's the problem ??

An n=0 binomial is the sum of an empty set of Bernoulli RV's, and the
sum over an empty set is identically 0.

Consistency is what you gain. E.g. 

 binom(.,n=n1+n2,p) == binom(.,n=n1,p) * binom(.,n=n2,p)

where * denotes convolution. This will also hold for n1=0 or n2=0 if
the binomial in that case is defined as a one-point distribution at
zero. Same thing as any(logical(0)) etc., really.

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

1 day later

Brian Ripley

Mon, Feb 6, 2006 1:51 AM #

On Sun, 5 Feb 2006, Peter Dalgaard wrote:

Consistency is a Good Thing, and I had already altered the codebase to 
consistently allow size=0 as a discrete distribution concentrated at 0.

There were other inconsistencies, e.g. whether the geometric/negative 
binomial functions allow prob=0 or prob=1.  I have no problem with prob=1 
(it is a discrete distribution concentrated on one point) and this was 
addressed for rnbinom before (PR#1218) but subsequently broken (which is 
why we like regression tests ...).  However prob=0 does not correspond to 
a proper distribution unless Inf is allowed as a value, and it was not so 
documented (nor implemented).  Indeed we had

[1] 0

[1] 0

[1] 0

and in fact dgeom gave zero for every allowed value.  So I cannot accept 
that as being right (and we even have a d-p-q-r test with prob=0).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595