Skip to content

Retrieve hypergeometric results in large scale

9 messages · Jeff Newmiller, jas4710, Bert Gunter +2 more

#
I'm going to use 

dhyper(x, m, n, k)

to get a 95% coverage. Let me use an example to explain my problem:

Suppose I have a urn containing 90 red and 10 black balls.
Now I wanna remove 3 from the urn. By the following codes:

m<-90;n<-10;k<-3;
x<-0:3
dhyper(x,m,n,k)

I can obtain the probability that 0,1,2,3 red balls will be removed.
 0.000742115 0.025046382 0.247680891 0.726530612

So >95% time, 2 to 3 red balls will be removed and the resultant composition
will be changed to 
87:10 or 88:9, the original percent of red balls will be changed from 90 to
89.69 to 90.72 then.

If now I have 50:50 and again to remove 3 balls, I will obtain the
probability as:
0.1212 0.3788 0.3788 0.1212

To get the resultant range of red balls for >95% time, this time all the
four cases have to consider and so the resultant change of red balls will
become 48.45 to 51.54

So my problem is, is there any convenient built-in function that helps
extract this 95% confidence interval-like data?






--
View this message in context: http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html
Sent from the R help mailing list archive at Nabble.com.
#
Perhaps you should read

?dhyper

and if you have a hard time parsing that, then read

?Distributions

and then go back to

?dhyper
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
jas4710 <watashi at post.com> wrote:

            
#
Homework? There's a no homework policy on this list.

-- Bert
On Mon, Oct 1, 2012 at 8:10 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:

  
    
#
Thanks Jeff 
The documentation pages, if I haven't missed any crucial points, illustrate
how to get probability and cumulative probability values.

I can first retrieve the data structures and use Perl (I don't know how to
use R...) to sort the derived ratios and sum the probability values until
the cumulative probability exceeds 95%. Well, I just don't know whether such
seemingly routine procedures have already been implemented...

Thanks again!



--
View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644701.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi Bert. This is not a homework. If I can do some basic programming in R like
Perl, then I'll have a better chance to accomplish this task but the matrix
concept is not quickly comprehensible...



--
View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644703.html
Sent from the R help mailing list archive at Nabble.com.
#
Thanks Jeff~~~

In fact I do not know how to combine and extract vectors in R.

ans<-sort(dhyper(x, m, n, k),decreasing=TRUE)
rbind(ans,cumsum(ans)

will show the first point that exceeds 95% threshold. The problem is:
*information is lost*
I can no longer identify where are the first few elements from. e.g. for 10
numbers, maybe they are from 4,5,6,7 or for 100 numbers, from 45 to 68

So to append ID's to the data for later retrieval? rbind appears to do the
job but not so exactly...



--
View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644715.html
Sent from the R help mailing list archive at Nabble.com.
#
If you have not already done so, stop what you are doing and work
through the Introduction to R tutorial that ships with R (or other R
tutorial on the web that you may prefer).

The tutorials are written to help you climb the R learning curve much
more efficiently than the fooling around that you appear to be doing
now.

-- Bert
On Mon, Oct 1, 2012 at 8:31 AM, jas4710 <watashi at post.com> wrote:

  
    
#
order() is usually a lot more useful than sort(), since, as you noticed,
sort() drops information about where each element in its output came
from.

Your example was incomplete so I made up one which I
think is similar.
  > n <- 10 ; p <- 0.7 ; k <- 0:n ; d <- dbinom(k, n, p)
  > plot(k, d) # density of binomial over its domain
If you want the indices of the largest density values whose
cumulative sum is less than 0.95 you
  > ord <- order(d, decreasing=TRUE) # indices such that d[ord] is in decreasing order
  > big <- ord[cumsum(d[ord]) < 0.95]
  > data.frame(big, d=d[big], cumsum=cumsum(d[big]))
    big         d    cumsum
  1   8 0.2668279 0.2668279
  2   9 0.2334744 0.5003024
  3   7 0.2001209 0.7004233
  4  10 0.1210608 0.8214841
  5   6 0.1029193 0.9244035
 > points(cex=2, k[big], d[big])

If you want to include the index of the density value that puts
you just over 0.95 first find the complement of the desired indices
and use setdiff to compute its complement.  E.g.,
  > ord <- order(d)
  > little <- ord[cumsum(d[ord]) < 0.05]
  > big <- setdiff(seq_along(d), little) # difference of two sets of numbers
  > big
  [1]  5  6  7  8  9 10

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com