proposed pbirthday fix - R-devel

Mon, Jan 23, 2006 12:43 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20060123/ab5579d1/attachment.pl

Martin Maechler

Mon, Jan 23, 2006 2:52 AM #

ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

Well, I liked your first fix better -- thank you for it! --
since it's always good practice to formulate such as to avoid
overflow when possible. 
All things considered, I think I'd go for

   upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

Martin 

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >> function(n, classes = 365, coincident = 2){
    >>     k <- coincident
    >>     c <- classes
    >>     if (coincident < 2) return(1)
    >>     if (coincident > n) return(0)
    >>     if (n > classes * (coincident - 1)) return(1)
    >>     eps <- 1e-14
    >>     if (qbirthday(1 - eps, classes, coincident) <= n)
    >>     return(1 - eps)
    >>     f <- function(p) qbirthday(p, c, k) - n
    >>     lower <- 0
    >>     upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>     nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>     nmin$root
    >> }

Martin Maechler

Mon, Jan 23, 2006 10:01 AM #

ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

    MM> Well, I liked your first fix better -- thank you for it! --
    MM> since it's always good practice to formulate such as to avoid
    MM> overflow when possible. 
    MM> All things considered, I think I'd go for

    MM> upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

    MM> Martin 

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >>> function(n, classes = 365, coincident = 2){
	..................

    >>> upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>> nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>> nmin$root
    >>> }

Well, now after inspection, I think ``get it to work''
is a bit of an exaggeration, at least for a purist like me
(some famous fortune teller once guessed it may be because I'm ... Swiss)
who doesn't like to lose precision in probability computations
unnecessarily. One can do much better:

The version of [pq]birthday() I've just committed to R-devel *) now gives

[1]  8.596245e-08  9.252349e-41 2.395639e-112 1.758236e-285

whereas the 'na.rm=TRUE' fix  would simply give

[1] 8.596245e-08 0.000000e+00 0.000000e+00 0.000000e+00

--
Martin Maechler, ETH Zurich

*) peek at https://svn.r-project.org/R/trunk/src/library/stats/R/pbirthday.R