Skip to content

proposed pbirthday fix

3 messages · Ken Knoblauch, Martin Maechler

#
ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

Well, I liked your first fix better -- thank you for it! --
since it's always good practice to formulate such as to avoid
overflow when possible. 
All things considered, I think I'd go for

   upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

Martin 

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >> function(n, classes = 365, coincident = 2){
    >>     k <- coincident
    >>     c <- classes
    >>     if (coincident < 2) return(1)
    >>     if (coincident > n) return(0)
    >>     if (n > classes * (coincident - 1)) return(1)
    >>     eps <- 1e-14
    >>     if (qbirthday(1 - eps, classes, coincident) <= n)
    >>     return(1 - eps)
    >>     f <- function(p) qbirthday(p, c, k) - n
    >>     lower <- 0
    >>     upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>     nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>     nmin$root
    >> }
#
ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

    MM> Well, I liked your first fix better -- thank you for it! --
    MM> since it's always good practice to formulate such as to avoid
    MM> overflow when possible. 
    MM> All things considered, I think I'd go for

    MM> upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

    MM> Martin 

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >>> function(n, classes = 365, coincident = 2){
	..................

    >>> upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>> nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>> nmin$root
    >>> }

Well, now after inspection, I think ``get it to work''
is a bit of an exaggeration, at least for a purist like me
(some famous fortune teller once guessed it may be because I'm ... Swiss)
who doesn't like to lose precision in probability computations
unnecessarily. One can do much better:

The version of [pq]birthday() I've just committed to R-devel *) now gives
[1]  8.596245e-08  9.252349e-41 2.395639e-112 1.758236e-285

whereas the 'na.rm=TRUE' fix  would simply give

[1] 8.596245e-08 0.000000e+00 0.000000e+00 0.000000e+00

--
Martin Maechler, ETH Zurich

*) peek at https://svn.r-project.org/R/trunk/src/library/stats/R/pbirthday.R