quantiles of the hypergeometric distribution (PR#502) - R-devel

Fri, Mar 24, 2000 4:23 AM #

Hello!

I use R-version 1.0.0
To get the 0.95 quantile of the hypergeometric
distribution with the parameters m=45000,n=5000 and
k=600 I use the R-command

The value obtained is 600. However, the true value
is 552. The latter can be obtained for example by
calling the corresponding distribution function 
with the R commands

The value 552 is obtained also by the usual normal
approximation.


Overall, it seems that R produces errors when calling
qhyper(p,m,n,k) for large values of the parametes 
m,n or k.

Sincerely
L. Baringhaus

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Torsten Hothorn

Fri, Mar 24, 2000 5:40 AM #

Well, 

had a short look. The problem arises in qhyper.c at line 56:

 term = exp(lfastchoose(NR, xr) + lfastchoose(NB, xb) - lfastchoose(N, n));

The expression inside the exp() function gives -1415.411856 and exp()
returns 0. So term and sum in 

 while(sum < p && xr < xend) {
        xr++;
        NB++;
        term *= (NR / xr) * (xb / NB);
        sum += term;
        xb--;
        NR--;
    }

are 0 all the time and xend = 600 is returned. 

R-Core: How to fix this?

Torsten

*****************************************************************
*				                                *
* Torsten Hothorn, Statistician 		                *
* at the moment: Institut fuer Statistik, TU Wien	        *
* Tel: 0043 1 58801 10772                  			*
* Fax: 0043 1 58801 10798                                       *
*                                                               *
*****************************************************************


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Fri, Mar 24, 2000 6:58 AM #

Torsten Hothorn <hothorn@ci.tuwien.ac.at> writes:

Ick! That is of course only to be expected with recurrence relations
like this applied to large numbers. Since the term formula (as far as
I remember!) applies to all the point probabilities, I suppose that
one could start the recurrence from a "known positive" value and work
both forwards and backwards from there. Since you still want to do the
sums from the low end (although that is perhaps not all that
important?) you might need an intermediate array to hold the values.
One idea could be to start from the normal approximation of the
desired quantile. Probably a bit more than an afternoons work, but
hardly several weeks...

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

Fri, Mar 24, 2000 8:10 AM #

On 24 Mar 2000, Peter Dalgaard BSA wrote:

As phyper seems to work we could just start at the Normal approximation to
qhyper and search for x such  that phyper(x)=p. Since it's a discrete
distribution this shouldn't take that long.

	-thomas





-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard

Fri, Mar 24, 2000 8:32 AM #

Thomas Lumley <thomas@biostat.washington.edu> writes:

Or apply the same fixup:

        sum += (small_N ? term : exp(term));
        ...
        if(small_N) term *= (NR / xr) * (xb / NB);
        else    term += log((NR / xr) * (xb / NB));

which is pretty obviously not optimized for speed, but certainly
avoids the multiply-by-zero effect.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._