Hi, I have an R script which is time consuming because it has two nested loops in it of at least 5000 iterations each, I have tried to use the multicore package but id doesn't seem to improve the elapsed time of the script(a shorter script for example) and I can't use the mcapply because of technical reasons. I was wondering how can I make my script use more cores and memory because I am running it on a server and it is a shame that it uses only one core. Thanks! Moriah -- View this message in context: http://r.789695.n4.nabble.com/Using-multicores-in-R-tp4651808.html Sent from the R help mailing list archive at Nabble.com.
Using multicores in R
6 messages · Uwe Ligges, Steve Lianoglou, Spencer Graves +2 more
On 03.12.2012 11:14, moriah wrote:
Hi, I have an R script which is time consuming because it has two nested loops in it of at least 5000 iterations each, I have tried to use the multicore package but id doesn't seem to improve the elapsed time of the script(a shorter script for example) and I can't use the mcapply because of technical reasons.
Errr, but otherwise multicore does not have an effect ... See package "parallel" that offers various functions for parallel computations. We cannot help much more if you do not tell us what the technical reasons are why mcapply() does not work. Best, Uwe Ligges
I was wondering how can I make my script use more cores and memory because I am running it on a server and it is a shame that it uses only one core.
Thanks! Moriah -- View this message in context: http://r.789695.n4.nabble.com/Using-multicores-in-R-tp4651808.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121203/c8e87989/attachment.pl>
1. Have you looked at CRAN Task View: High-Performance and Parallel Computing with R (http://cran.r-project.org/web/views/HighPerformanceComputing.html)? 2. Have you tried the "compiler" package? If I understand correctly, R is a two-stage interpreter, first translating what we know as R into byte code, which is then interpreted by a byte code interpreter. If my memory is correct, this approach can cut the compute time by a factor of 100. 3. Have you reviewed the section on "Profiling R code for speed" in the "Writing R Extensions" manual that becomes available after help.start()? The profiling tools discussed there help identify the portion of more complex code that takes the most time. The standard advice then is to experiment with writing the most time consuming portion several different ways. I've seen many examples where writing what appears to be the same thing in R several different ways identifies one that is easily 10 and maybe 100 or 1000 times faster than the slowest alternative tried. 4. Have you tried using the "sos" package to search for other functions and packages in R that may already have good code doing some of the things you want to do? The "findFn" function in "sos" searches the "functions" subset of the "RSiteSearch" database and returns the result sorted by package. There are also a "union" and "writeFindFn2xls" functions to make it easy to manipulate and evaluate the results, described in a vignette. It's the best literature search I know for anything statistical: If I don't find it there, it's OK to look someplace else. [Caveat: I'm the lead author of "sos", so I'm biased.] Best Wishes, Spencer
On 12/3/2012 6:24 AM, Steve Lianoglou wrote:
And also: On Monday, December 3, 2012, Uwe Ligges wrote:
On 03.12.2012 11:14, moriah wrote:
Hi, I have an R script which is time consuming because it has two nested loops in it of at least 5000 iterations each, I have tried to use the multicore package but id doesn't seem to improve the elapsed time of the script(a shorter script for example) and I can't use the mcapply because of technical reasons.
Errr, but otherwise multicore does not have an effect ... See package "parallel" that offers various functions for parallel computations. We cannot help much more if you do not tell us what the technical reasons are why mcapply() does not work.
If the work you are doing within each iteration of the loop is trivial, you will likely even see a decrease in performance if you try to parallelize it. Without more info from you regarding your problem, there's little we can do to help, tho. -Steve
-- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com
Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com
Moriah, Since you are doing nested loops, Rcpp may be an easy speed-up. Follow all the links here http://blog.revolutionanalytics.com/2012/11/hadleys-guide-to-high-performance-r-with-rcpp.html for details. HTH, Jim Porzak Minted.com San Francisco, CA www.linkedin.com/in/jimporzak use R! Group SF: www.meetup.com/R-Users/
On Mon, Dec 3, 2012 at 2:14 AM, moriah <moriahcohen at gmail.com> wrote:
Hi, I have an R script which is time consuming because it has two nested loops in it of at least 5000 iterations each, I have tried to use the multicore package but id doesn't seem to improve the elapsed time of the script(a shorter script for example) and I can't use the mcapply because of technical reasons. I was wondering how can I make my script use more cores and memory because I am running it on a server and it is a shame that it uses only one core. Thanks! Moriah -- View this message in context: http://r.789695.n4.nabble.com/Using-multicores-in-R-tp4651808.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks for the help,
Perhaps I should elaborate a bit, I am working on bioinformatics project in
which I am trying to run a forward selection algorithm for machine learning
classification of two biological conditions.
At each iteration I want to find the gene that in addition to those I have
found already does the best classification.
It looks something like this:
for (j in 1:5030)
{
tp <- 0;
for (i in 1:5030)
{
if (!(i %in% idx))
{
classifier<-naiveBayes(trn[,c(i,idx)], trn[,20118])
tbl <-table(predict(classifier, trn[,-20118]), trn[,20118])
success <- (tbl[[1]] +tbl[[4]])/(tbl[[1]] +tbl[[4]]+tbl[[2]]+tbl[[3]])
if (success > tp)
{
tp <- success
ind <- i
gene <- names(trn)[i]
}
}
}
idx <- c(idx,ind)
res <- rbind(res, data.frame(Iteration=j,Success=tp*100,Gene=gene))
}
I am no expert when it comes to programming so I am not sure how can I
optimize my relatively primitive code in the best way...
Thanks,
Moriah
--
View this message in context: http://r.789695.n4.nabble.com/Using-multicores-in-R-tp4651808p4652034.html
Sent from the R help mailing list archive at Nabble.com.