I'm going to use dhyper(x, m, n, k) to get a 95% coverage. Let me use an example to explain my problem: Suppose I have a urn containing 90 red and 10 black balls. Now I wanna remove 3 from the urn. By the following codes: m<-90;n<-10;k<-3; x<-0:3 dhyper(x,m,n,k) I can obtain the probability that 0,1,2,3 red balls will be removed. 0.000742115 0.025046382 0.247680891 0.726530612 So >95% time, 2 to 3 red balls will be removed and the resultant composition will be changed to 87:10 or 88:9, the original percent of red balls will be changed from 90 to 89.69 to 90.72 then. If now I have 50:50 and again to remove 3 balls, I will obtain the probability as: 0.1212 0.3788 0.3788 0.1212 To get the resultant range of red balls for >95% time, this time all the four cases have to consider and so the resultant change of red balls will become 48.45 to 51.54 So my problem is, is there any convenient built-in function that helps extract this 95% confidence interval-like data? -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html Sent from the R help mailing list archive at Nabble.com.
Retrieve hypergeometric results in large scale
9 messages · Jeff Newmiller, jas4710, Bert Gunter +2 more
Perhaps you should read
?dhyper
and if you have a hard time parsing that, then read
?Distributions
and then go back to
?dhyper
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
jas4710 <watashi at post.com> wrote:
I'm going to use dhyper(x, m, n, k) to get a 95% coverage. Let me use an example to explain my problem: Suppose I have a urn containing 90 red and 10 black balls. Now I wanna remove 3 from the urn. By the following codes: m<-90;n<-10;k<-3; x<-0:3 dhyper(x,m,n,k) I can obtain the probability that 0,1,2,3 red balls will be removed. 0.000742115 0.025046382 0.247680891 0.726530612 So >95% time, 2 to 3 red balls will be removed and the resultant composition will be changed to 87:10 or 88:9, the original percent of red balls will be changed from 90 to 89.69 to 90.72 then. If now I have 50:50 and again to remove 3 balls, I will obtain the probability as: 0.1212 0.3788 0.3788 0.1212 To get the resultant range of red balls for >95% time, this time all the four cases have to consider and so the resultant change of red balls will become 48.45 to 51.54 So my problem is, is there any convenient built-in function that helps extract this 95% confidence interval-like data? -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Homework? There's a no homework policy on this list. -- Bert
On Mon, Oct 1, 2012 at 8:10 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
Perhaps you should read
?dhyper
and if you have a hard time parsing that, then read
?Distributions
and then go back to
?dhyper
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
jas4710 <watashi at post.com> wrote:
I'm going to use dhyper(x, m, n, k) to get a 95% coverage. Let me use an example to explain my problem: Suppose I have a urn containing 90 red and 10 black balls. Now I wanna remove 3 from the urn. By the following codes: m<-90;n<-10;k<-3; x<-0:3 dhyper(x,m,n,k) I can obtain the probability that 0,1,2,3 red balls will be removed. 0.000742115 0.025046382 0.247680891 0.726530612 So >95% time, 2 to 3 red balls will be removed and the resultant composition will be changed to 87:10 or 88:9, the original percent of red balls will be changed from 90 to 89.69 to 90.72 then. If now I have 50:50 and again to remove 3 balls, I will obtain the probability as: 0.1212 0.3788 0.3788 0.1212 To get the resultant range of red balls for >95% time, this time all the four cases have to consider and so the resultant change of red balls will become 48.45 to 51.54 So my problem is, is there any convenient built-in function that helps extract this 95% confidence interval-like data? -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-hypergeometric-results-in-large-scale-tp4644683.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Thanks Jeff The documentation pages, if I haven't missed any crucial points, illustrate how to get probability and cumulative probability values. I can first retrieve the data structures and use Perl (I don't know how to use R...) to sort the derived ratios and sum the probability values until the cumulative probability exceeds 95%. Well, I just don't know whether such seemingly routine procedures have already been implemented... Thanks again! -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644701.html Sent from the R help mailing list archive at Nabble.com.
Hi Bert. This is not a homework. If I can do some basic programming in R like Perl, then I'll have a better chance to accomplish this task but the matrix concept is not quickly comprehensible... -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644703.html Sent from the R help mailing list archive at Nabble.com.
Thanks Jeff~~~ In fact I do not know how to combine and extract vectors in R. ans<-sort(dhyper(x, m, n, k),decreasing=TRUE) rbind(ans,cumsum(ans) will show the first point that exceeds 95% threshold. The problem is: *information is lost* I can no longer identify where are the first few elements from. e.g. for 10 numbers, maybe they are from 4,5,6,7 or for 100 numbers, from 45 to 68 So to append ID's to the data for later retrieval? rbind appears to do the job but not so exactly... -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644715.html Sent from the R help mailing list archive at Nabble.com.
If you have not already done so, stop what you are doing and work through the Introduction to R tutorial that ships with R (or other R tutorial on the web that you may prefer). The tutorials are written to help you climb the R learning curve much more efficiently than the fooling around that you appear to be doing now. -- Bert
On Mon, Oct 1, 2012 at 8:31 AM, jas4710 <watashi at post.com> wrote:
Hi Bert. This is not a homework. If I can do some basic programming in R like Perl, then I'll have a better chance to accomplish this task but the matrix concept is not quickly comprehensible... -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of-results-from-a-hypergeometric-distribution-tp4644683p4644703.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121001/35a4c4df/attachment.pl>
order() is usually a lot more useful than sort(), since, as you noticed,
sort() drops information about where each element in its output came
from.
Your example was incomplete so I made up one which I
think is similar.
> n <- 10 ; p <- 0.7 ; k <- 0:n ; d <- dbinom(k, n, p)
> plot(k, d) # density of binomial over its domain
If you want the indices of the largest density values whose
cumulative sum is less than 0.95 you
> ord <- order(d, decreasing=TRUE) # indices such that d[ord] is in decreasing order
> big <- ord[cumsum(d[ord]) < 0.95]
> data.frame(big, d=d[big], cumsum=cumsum(d[big]))
big d cumsum
1 8 0.2668279 0.2668279
2 9 0.2334744 0.5003024
3 7 0.2001209 0.7004233
4 10 0.1210608 0.8214841
5 6 0.1029193 0.9244035
> points(cex=2, k[big], d[big])
If you want to include the index of the density value that puts
you just over 0.95 first find the complement of the desired indices
and use setdiff to compute its complement. E.g.,
> ord <- order(d)
> little <- ord[cumsum(d[ord]) < 0.05]
> big <- setdiff(seq_along(d), little) # difference of two sets of numbers
> big
[1] 5 6 7 8 9 10
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of jas4710 Sent: Monday, October 01, 2012 9:59 AM To: r-help at r-project.org Subject: Re: [R] Retrieve hypergeometric results in large scale Thanks Jeff~~~ In fact I do not know how to combine and extract vectors in R. ans<-sort(dhyper(x, m, n, k),decreasing=TRUE) rbind(ans,cumsum(ans) will show the first point that exceeds 95% threshold. The problem is: *information is lost* I can no longer identify where are the first few elements from. e.g. for 10 numbers, maybe they are from 4,5,6,7 or for 100 numbers, from 45 to 68 So to append ID's to the data for later retrieval? rbind appears to do the job but not so exactly... -- View this message in context: http://r.789695.n4.nabble.com/Retrieve-95-coverage-of- results-from-a-hypergeometric-distribution-tp4644683p4644715.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.