Full_Name: Uffe H?gsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN. Note that dbinom(x=0,size=0,prob=0.5) returns the value 1. Cheers, Uffe
pbinom with size argument 0 (PR#8560)
7 messages · uht@dfu.min.dk, (Ted Harding), Peter Dalgaard +2 more
On 03-Feb-06 uht at dfu.min.dk wrote:
Full_Name: Uffe H?gsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN.
Well, "NaN" can make sense since "q=0" refers to a single sampled value, and there is no value which you can sample from "size=0"; i.e. sampling from "size=0" is a non-event. I think the probability of a non-event should be NaN, not 1! (But maybe others might argue that if you try to sample from an empty urn you necessarily get zero "successes", so p should be 1; but I would counter that you also necessarily get zero "failures" so q should be 1. I suppose it may be a matter of whether you regard the "r" of the binomial distribution as referring to the "identities" of the outcomes rather than to how many you get of a particular type. Hmmm.)
Note that dbinom(x=0,size=0,prob=0.5) returns the value 1.
That is probably because the .Internal code for pbinom may do a preliminary test for "x >= size". This also makes sense, for the cumulative p<dist> for any <dist> with a finite range, since the answer must then be 1 and a lot of computation would be saved (likewise returning 0 when x < 0). However, it would make even more sense to have a preceding test for "size<=0" and return NaN in that case since, for the same reasons as above, the result is the probability of a non-event. (But it depends on your point of view, as above ... However, surely the two should be consistent with each other.) Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 03-Feb-06 Time: 14:34:28 ------------------------------ XFMail ------------------------------
(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
On 03-Feb-06 uht at dfu.min.dk wrote:
Full_Name: Uffe H?gsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN.
Well, "NaN" can make sense since "q=0" refers to a single sampled value, and there is no value which you can sample from "size=0"; i.e. sampling from "size=0" is a non-event. I think the probability of a non-event should be NaN, not 1! (But maybe others might argue that if you try to sample from an empty urn you necessarily get zero "successes", so p should be 1; but I would counter that you also necessarily get zero "failures" so q should be 1. I suppose it may be a matter of whether you regard the "r" of the binomial distribution as referring to the "identities" of the outcomes rather than to how many you get of a particular type. Hmmm.)
Note that dbinom(x=0,size=0,prob=0.5) returns the value 1.
That is probably because the .Internal code for pbinom may do a preliminary test for "x >= size". This also makes sense, for the cumulative p<dist> for any <dist> with a finite range, since the answer must then be 1 and a lot of computation would be saved (likewise returning 0 when x < 0). However, it would make even more sense to have a preceding test for "size<=0" and return NaN in that case since, for the same reasons as above, the result is the probability of a non-event.
Once you get your coffee, you'll likely realize that you got your p's and d's mixed up... I think Uffe is perfectly right: The result of zero experiments will be zero successes (and zero failures) with probability 1, so the cumulative distribution function is a step function with one step at zero ( == as.numeric(x>=0) ).
(But it depends on your point of view, as above ... However, surely the two should be consistent with each other.) Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 03-Feb-06 Time: 14:34:28 ------------------------------ XFMail ------------------------------
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On 03-Feb-06 Peter Dalgaard wrote:
(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
On 03-Feb-06 uht at dfu.min.dk wrote:
Full_Name: Uffe H?gsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN.
Well, "NaN" can make sense since "q=0" refers to a single sampled value, and there is no value which you can sample from "size=0"; i.e. sampling from "size=0" is a non-event. I think the probability of a non-event should be NaN, not 1! (But maybe others might argue that if you try to sample from an empty urn you necessarily get zero "successes", so p should be 1; but I would counter that you also necessarily get zero "failures" so q should be 1. I suppose it may be a matter of whether you regard the "r" of the binomial distribution as referring to the "identities" of the outcomes rather than to how many you get of a particular type. Hmmm.)
Note that dbinom(x=0,size=0,prob=0.5) returns the value 1.
That is probably because the .Internal code for pbinom may do a preliminary test for "x >= size". This also makes sense, for the cumulative p<dist> for any <dist> with a finite range, since the answer must then be 1 and a lot of computation would be saved (likewise returning 0 when x < 0). However, it would make even more sense to have a preceding test for "size<=0" and return NaN in that case since, for the same reasons as above, the result is the probability of a non-event.
Once you get your coffee, you'll likely realize that you got your p's and d's mixed up...
You're right about the mix-up! (I must mend the pipeline.)
I think Uffe is perfectly right: The result of zero experiments will be zero successes (and zero failures) with probability 1, so the cumulative distribution function is a step function with one step at zero ( == as.numeric(x>=0) ).
I'm perfectly happy with this argument so long as it leads to dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1 (which seems to be what you are arguing too). And I think there are no traps if p=0 or p=1.
(But it depends on your point of view, as above ... However, surely the two should be consistent with each other.)
Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 03-Feb-06 Time: 15:07:49 ------------------------------ XFMail ------------------------------
1 day later
(Ted Harding) wrote:
On 03-Feb-06 Peter Dalgaard wrote:
(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
On 03-Feb-06 uht at dfu.min.dk wrote:
Full_Name: Uffe H?gsbro Thygesen Version: 2.2.0 OS: linux Submission from: (NULL) (130.226.135.250) Hello all. pbinom(q=0,size=0,prob=0.5) returns the value NaN. I had expected the result 1. In fact any value for q seems to give an NaN.
Well, "NaN" can make sense since "q=0" refers to a single sampled value, and there is no value which you can sample from "size=0"; i.e. sampling from "size=0" is a non-event. I think the probability of a non-event should be NaN, not 1! (But maybe others might argue that if you try to sample from an empty urn you necessarily get zero "successes", so p should be 1; but I would counter that you also necessarily get zero "failures" so q should be 1. I suppose it may be a matter of whether you regard the "r" of the binomial distribution as referring to the "identities" of the outcomes rather than to how many you get of a particular type. Hmmm.)
Note that dbinom(x=0,size=0,prob=0.5) returns the value 1.
That is probably because the .Internal code for pbinom may do a preliminary test for "x >= size". This also makes sense, for the cumulative p<dist> for any <dist> with a finite range, since the answer must then be 1 and a lot of computation would be saved (likewise returning 0 when x < 0). However, it would make even more sense to have a preceding test for "size<=0" and return NaN in that case since, for the same reasons as above, the result is the probability of a non-event.
Once you get your coffee, you'll likely realize that you got your p's and d's mixed up...
You're right about the mix-up! (I must mend the pipeline.)
I think Uffe is perfectly right: The result of zero experiments will be zero successes (and zero failures) with probability 1, so the cumulative distribution function is a step function with one step at zero ( == as.numeric(x>=0) ).
I'm perfectly happy with this argument so long as it leads to dbinom(x=0,size=0,prob=p)=1 and also pbinom(q=0,size=0,prob=p)=1 (which seems to be what you are arguing too). And I think there are no traps if p=0 or p=1.
(But it depends on your point of view, as above ... However, surely the two should be consistent with each other.)
Ted.
I prefer a (consistent) NaN. What happens to our notion of a Binomial RV as a sequence of Bernoulli RVs if we permit n=0? I have never seen (nor contemplated, I confess) the definition of a Bernoulli RV as anything other than some dichotomous-outcome one-trial random experiment. Not n trials, where n might equal zero, but _one_ trial. I can't see what would be gained by permitting a zero-trial experiment. If we assign probability 1 to each outcome, we have a problem with the sum of the probabilities. Peter Ehlers
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 03-Feb-06 Time: 15:07:49 ------------------------------ XFMail ------------------------------
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
P Ehlers <ehlers at math.ucalgary.ca> writes:
I prefer a (consistent) NaN. What happens to our notion of a Binomial RV as a sequence of Bernoulli RVs if we permit n=0? I have never seen (nor contemplated, I confess) the definition of a Bernoulli RV as anything other than some dichotomous-outcome one-trial random experiment.
What's the problem ?? An n=0 binomial is the sum of an empty set of Bernoulli RV's, and the sum over an empty set is identically 0.
Not n trials, where n might equal zero, but _one_ trial. I can't see what would be gained by permitting a zero-trial experiment. If we assign probability 1 to each outcome, we have a problem with the sum of the probabilities.
Consistency is what you gain. E.g. binom(.,n=n1+n2,p) == binom(.,n=n1,p) * binom(.,n=n2,p) where * denotes convolution. This will also hold for n1=0 or n2=0 if the binomial in that case is defined as a one-point distribution at zero. Same thing as any(logical(0)) etc., really.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
1 day later
On Sun, 5 Feb 2006, Peter Dalgaard wrote:
P Ehlers <ehlers at math.ucalgary.ca> writes:
I prefer a (consistent) NaN. What happens to our notion of a Binomial RV as a sequence of Bernoulli RVs if we permit n=0? I have never seen (nor contemplated, I confess) the definition of a Bernoulli RV as anything other than some dichotomous-outcome one-trial random experiment.
What's the problem ?? An n=0 binomial is the sum of an empty set of Bernoulli RV's, and the sum over an empty set is identically 0.
Not n trials, where n might equal zero, but _one_ trial. I can't see what would be gained by permitting a zero-trial experiment. If we assign probability 1 to each outcome, we have a problem with the sum of the probabilities.
Consistency is what you gain. E.g. binom(.,n=n1+n2,p) == binom(.,n=n1,p) * binom(.,n=n2,p) where * denotes convolution. This will also hold for n1=0 or n2=0 if the binomial in that case is defined as a one-point distribution at zero. Same thing as any(logical(0)) etc., really.
Consistency is a Good Thing, and I had already altered the codebase to consistently allow size=0 as a discrete distribution concentrated at 0. There were other inconsistencies, e.g. whether the geometric/negative binomial functions allow prob=0 or prob=1. I have no problem with prob=1 (it is a discrete distribution concentrated on one point) and this was addressed for rnbinom before (PR#1218) but subsequently broken (which is why we like regression tests ...). However prob=0 does not correspond to a proper distribution unless Inf is allowed as a value, and it was not so documented (nor implemented). Indeed we had
dgeom(2, prob=0)
[1] 0
dgeom(Inf, prob=0)
[1] 0
pgeom(Inf, prob=0)
[1] 0 and in fact dgeom gave zero for every allowed value. So I cannot accept that as being right (and we even have a d-p-q-r test with prob=0).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595