pbinom( ) function (PR#8700)
On 3/22/2006 10:08 AM, Peter Dalgaard wrote:
Duncan Murdoch <murdoch at stats.uwo.ca> writes:
On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:
"cspark" == cspark <cspark at clemson.edu>
on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:
cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
cspark> EL4 Submission from: (NULL) (130.127.112.89)
cspark> pbinom(any negative value, size, prob) should be
cspark> zero. But I got the following results. I mean, if
cspark> a negative value is close to zero, then pbinom()
cspark> calculate pbinom(0, size, prob).
>> pbinom( -2.220446e-22, 3,.1)
[1] 0.729
>> pbinom( -2.220446e-8, 3,.1)
[1] 0.729
>> pbinom( -2.220446e-7, 3,.1)
[1] 0
Yes, all the [dp]* functions which are discrete with mass on the
integers only, do *round* their 'x' to integers.
I could well argue that the current behavior is *not* a bug,
since we do treat "x close to integer" as integer, and hence
pbinom(eps, size, prob) with eps "very close to 0" should give
pbinom(0, size, prob)
as it now does.
However, for esthetical reasons,
I agree that we should test for "< 0" first (and give 0 then) and only
round otherwise. I'll change this for R-devel (i.e. R 2.3.0 in
about a month).
cspark> dbinom() also behaves similarly.
yes, similarly, but differently.
I have changed it (for R-devel) as well, to behave the same as
others d*() , e.g., dpois(), dnbinom() do.
Martin, your description makes it sound as though dbinom(0.3, size, prob) would give the same answer as dbinom(0, size, prob), whereas it actually gives 0 with a warning, as documented in ?dbinom. The d* functions only round near-integers to integers, where it looks as though near means within 1E-7. The p* functions round near integers to integers, and truncate others to the integer below.
Well, the p-functions are constant on the intervals between integers...
Not quite: they're constant on intervals (n - 1e-7, n+1 - 1e-7), for integers n. Since Martin's change, this is not true for n=0. (Or, did you refer to the lack of a warning? One point
could be that cumulative p.d.f.s extends naturally to non-integers, whereas densities don't really extend, since they are defined with respect to counting measure on the integers.)
I wasn't complaining about the behaviour here, I was just clarifying Martin's description of it, when he said that "all the [dp]* functions which are discrete with mass on the integers only, do *round* their 'x' to integers".
I suppose the reason for this behaviour is to protect against rounding error giving nonsense results; I'm not sure that's a great idea, but if we do it, should we really be handling 0 differently?
Most of these round-near-integer issues were spurred by real programming problems. It is somewhat hard to come up with a problem that leads you generate a binomial variate value with "floating point noise", but I'm quite sure that we'll be reminded if we try to change it... (One potential issue is back-calculation to counts from relative frequencies).
Again, I wasn't suggesting we change the general +/- 1E-7 behaviour (though it should be documented to avoid bug reports like this one), but I'm worried about having zero as a special case. This will break relations such as dbinom(x, n, 0.5) == dbinom(n-x, n, 0.5) (in the case where x is n+epsilon or -epsilon, for small enough epsilon). Is it really desirable to break the symmetry like this? Duncan Murdoch