pbinom( ) function (PR#8700) - R-devel

Wed, Mar 22, 2006 12:52 AM #

cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
    cspark> EL4 Submission from: (NULL) (130.127.112.89)



    cspark> pbinom(any negative value, size, prob) should be
    cspark> zero.  But I got the following results.  I mean, if
    cspark> a negative value is close to zero, then pbinom()
    cspark> calculate pbinom(0, size, prob). 

    >> pbinom( -2.220446e-22, 3,.1)
    [1] 0.729
    >> pbinom( -2.220446e-8, 3,.1)
    [1] 0.729
    >> pbinom( -2.220446e-7, 3,.1)
    [1] 0

Yes, all the [dp]* functions which are discrete with mass on the
integers only, do *round* their 'x' to integers.

I could well argue that the current behavior is *not* a bug,
since we do treat "x close to integer" as integer, and hence 
   pbinom(eps, size, prob)  with  eps "very close to 0" should give
   pbinom(0,   size, prob)
as it now does.

However, for esthetical reasons, 
I agree that we should test for "< 0" first (and give 0 then) and only
round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
about a month).

    cspark> dbinom() also behaves similarly.

yes, similarly, but differently.
I have changed it (for R-devel) as well, to behave the same as
others d*() , e.g., dpois(), dnbinom() do.


Martin Maechler, ETH Zurich

Duncan Murdoch

Wed, Mar 22, 2006 4:40 AM #

On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:

Martin, your description makes it sound as though dbinom(0.3, size, 
prob) would give the same answer as dbinom(0, size, prob), whereas it 
actually gives 0 with a warning, as documented in ?dbinom.  The d* 
functions only round near-integers to integers, where it looks as though 
near means within 1E-7.  The p* functions round near integers to 
integers, and truncate others to the integer below.

I suppose the reason for this behaviour is to protect against rounding 
error giving nonsense results; I'm not sure that's a great idea, but if 
we do it, should we really be handling 0 differently?

Duncan Murdoch

Peter Dalgaard

Wed, Mar 22, 2006 7:08 AM #

Duncan Murdoch <murdoch at stats.uwo.ca> writes:

On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:

"cspark" == cspark  <cspark at clemson.edu>
    on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:

    cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
    cspark> EL4 Submission from: (NULL) (130.127.112.89)



    cspark> pbinom(any negative value, size, prob) should be
    cspark> zero.  But I got the following results.  I mean, if
    cspark> a negative value is close to zero, then pbinom()
    cspark> calculate pbinom(0, size, prob).

    >> pbinom( -2.220446e-22, 3,.1)

    [1] 0.729

    >> pbinom( -2.220446e-8, 3,.1)

    [1] 0.729

    >> pbinom( -2.220446e-7, 3,.1)

    [1] 0

Yes, all the [dp]* functions which are discrete with mass on the
integers only, do *round* their 'x' to integers.

I could well argue that the current behavior is *not* a bug,
since we do treat "x close to integer" as integer, and hence 
   pbinom(eps, size, prob)  with  eps "very close to 0" should give
   pbinom(0,   size, prob)
as it now does.

However, for esthetical reasons, 
I agree that we should test for "< 0" first (and give 0 then) and only
round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
about a month).

    cspark> dbinom() also behaves similarly.

yes, similarly, but differently.
I have changed it (for R-devel) as well, to behave the same as
others d*() , e.g., dpois(), dnbinom() do.

Martin, your description makes it sound as though dbinom(0.3, size, 
prob) would give the same answer as dbinom(0, size, prob), whereas it 
actually gives 0 with a warning, as documented in ?dbinom.  The d* 
functions only round near-integers to integers, where it looks as though 
near means within 1E-7.  The p* functions round near integers to 
integers, and truncate others to the integer below.

Well, the p-functions are constant on the intervals between
integers... (Or, did you refer to the lack of a warning? One point
could be that cumulative p.d.f.s extends naturally to non-integers,
whereas densities don't really extend, since they are defined with
respect to counting measure on the integers.)

Most of these round-near-integer issues were spurred by real
programming problems. It is somewhat hard to come up with a problem
that leads you generate a binomial variate value with "floating point
noise", but I'm quite sure that we'll be reminded if we try to change
it... (One potential issue is back-calculation to counts from relative
frequencies).

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Duncan Murdoch

Wed, Mar 22, 2006 8:38 AM #

On 3/22/2006 10:08 AM, Peter Dalgaard wrote:

Duncan Murdoch <murdoch at stats.uwo.ca> writes:

On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:

"cspark" == cspark  <cspark at clemson.edu>
    on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:

    cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
    cspark> EL4 Submission from: (NULL) (130.127.112.89)



    cspark> pbinom(any negative value, size, prob) should be
    cspark> zero.  But I got the following results.  I mean, if
    cspark> a negative value is close to zero, then pbinom()
    cspark> calculate pbinom(0, size, prob).

    >> pbinom( -2.220446e-22, 3,.1)

    [1] 0.729

    >> pbinom( -2.220446e-8, 3,.1)

    [1] 0.729

    >> pbinom( -2.220446e-7, 3,.1)

    [1] 0

Yes, all the [dp]* functions which are discrete with mass on the
integers only, do *round* their 'x' to integers.

I could well argue that the current behavior is *not* a bug,
since we do treat "x close to integer" as integer, and hence 
   pbinom(eps, size, prob)  with  eps "very close to 0" should give
   pbinom(0,   size, prob)
as it now does.

However, for esthetical reasons, 
I agree that we should test for "< 0" first (and give 0 then) and only
round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
about a month).

    cspark> dbinom() also behaves similarly.

yes, similarly, but differently.
I have changed it (for R-devel) as well, to behave the same as
others d*() , e.g., dpois(), dnbinom() do.

Martin, your description makes it sound as though dbinom(0.3, size, 
prob) would give the same answer as dbinom(0, size, prob), whereas it 
actually gives 0 with a warning, as documented in ?dbinom.  The d* 
functions only round near-integers to integers, where it looks as though 
near means within 1E-7.  The p* functions round near integers to 
integers, and truncate others to the integer below.

Well, the p-functions are constant on the intervals between
integers...

Not quite:  they're constant on intervals (n - 1e-7, n+1 - 1e-7), for 
integers n.  Since Martin's change, this is not true for n=0.

(Or, did you refer to the lack of a warning? One point

I wasn't complaining about the behaviour here, I was just clarifying 
Martin's description of it, when he said that "all the [dp]* functions 
which are discrete with mass on the integers only, do *round* their 'x' 
to integers".

Again, I wasn't suggesting we change the general +/- 1E-7 behaviour 
(though it should be documented to avoid bug reports like this one), but 
I'm worried about having zero as a special case.  This will break 
relations such as

  dbinom(x, n, 0.5) == dbinom(n-x, n, 0.5)

(in the case where x is n+epsilon or -epsilon, for small enough 
epsilon).  Is it really desirable to break the symmetry like this?

Duncan Murdoch