Skip to content
Prev 306709 / 398506 Next

Retrieve hypergeometric results in large scale

order() is usually a lot more useful than sort(), since, as you noticed,
sort() drops information about where each element in its output came
from.

Your example was incomplete so I made up one which I
think is similar.
  > n <- 10 ; p <- 0.7 ; k <- 0:n ; d <- dbinom(k, n, p)
  > plot(k, d) # density of binomial over its domain
If you want the indices of the largest density values whose
cumulative sum is less than 0.95 you
  > ord <- order(d, decreasing=TRUE) # indices such that d[ord] is in decreasing order
  > big <- ord[cumsum(d[ord]) < 0.95]
  > data.frame(big, d=d[big], cumsum=cumsum(d[big]))
    big         d    cumsum
  1   8 0.2668279 0.2668279
  2   9 0.2334744 0.5003024
  3   7 0.2001209 0.7004233
  4  10 0.1210608 0.8214841
  5   6 0.1029193 0.9244035
 > points(cex=2, k[big], d[big])

If you want to include the index of the density value that puts
you just over 0.95 first find the complement of the desired indices
and use setdiff to compute its complement.  E.g.,
  > ord <- order(d)
  > little <- ord[cumsum(d[ord]) < 0.05]
  > big <- setdiff(seq_along(d), little) # difference of two sets of numbers
  > big
  [1]  5  6  7  8  9 10

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com