Skip to content

Using multicores in R

6 messages · Uwe Ligges, Steve Lianoglou, Spencer Graves +2 more

#
Hi,

I have an R script which is time consuming because it has two nested loops
in it of at least 5000 iterations each, I have tried to use the multicore
package but id doesn't seem to improve the elapsed time of the script(a
shorter script for example) and I can't use the mcapply because of technical
reasons.

I was wondering how can I make my script use more cores and memory because I
am running it on a server and it is a shame that it uses only one core. 

Thanks!
Moriah
 



--
View this message in context: http://r.789695.n4.nabble.com/Using-multicores-in-R-tp4651808.html
Sent from the R help mailing list archive at Nabble.com.
#
On 03.12.2012 11:14, moriah wrote:
Errr, but otherwise multicore does not have an effect ...

See package "parallel" that offers various functions for parallel 
computations. We cannot help much more if you do not tell us what the 
technical reasons are why mcapply() does not work.

Best,
Uwe Ligges
#
1.  Have you looked at CRAN Task View: High-Performance and 
Parallel Computing with R 
(http://cran.r-project.org/web/views/HighPerformanceComputing.html)?


       2.  Have you tried the "compiler" package?  If I understand 
correctly, R is a two-stage interpreter, first translating what we know 
as R into byte code, which is then interpreted by a byte code 
interpreter.  If my memory is correct, this approach can cut the compute 
time by a factor of 100.


       3.  Have you reviewed the section on "Profiling R code for speed" 
in the "Writing R Extensions" manual that becomes available after 
help.start()?  The profiling tools discussed there help identify the 
portion of more complex code that takes the most time.  The standard 
advice then is to experiment with writing the most time consuming 
portion several different ways.  I've seen many examples where writing 
what appears to be the same thing in R several different ways identifies 
one that is easily 10 and maybe 100 or 1000 times faster than the 
slowest alternative tried.


       4.  Have you tried using the "sos" package to search for other 
functions and packages in R that may already have good code doing some 
of the things you want to do?  The "findFn" function in "sos" searches 
the "functions" subset of the "RSiteSearch" database and returns the 
result sorted by package.  There are also a "union" and 
"writeFindFn2xls" functions to make it easy to manipulate and evaluate 
the results, described in a vignette. It's the best literature search I 
know for anything statistical: If I don't find it there, it's OK to look 
someplace else. [Caveat:  I'm the lead author of "sos", so I'm biased.]


       Best Wishes,
       Spencer
On 12/3/2012 6:24 AM, Steve Lianoglou wrote:
-- 
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San Jos?, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com
#
Moriah,

Since you are doing nested loops, Rcpp may be an easy speed-up. Follow
all the links here
http://blog.revolutionanalytics.com/2012/11/hadleys-guide-to-high-performance-r-with-rcpp.html
for details.

HTH,
Jim Porzak
Minted.com
San Francisco, CA
www.linkedin.com/in/jimporzak
use R! Group SF: www.meetup.com/R-Users/
On Mon, Dec 3, 2012 at 2:14 AM, moriah <moriahcohen at gmail.com> wrote:
#
Thanks for the help, 

Perhaps I should elaborate a bit, I am working on bioinformatics  project in
which I am trying to run a forward selection algorithm for machine learning
classification of two biological conditions. 
At each iteration I want to find the gene that in addition to those I have
found already does the best classification.

It looks something like this:

for (j in 1:5030)
  { 
  tp <- 0;
  for (i in 1:5030)
  {
    if (!(i %in% idx))
    {

      classifier<-naiveBayes(trn[,c(i,idx)], trn[,20118]) 
      tbl <-table(predict(classifier, trn[,-20118]), trn[,20118])
      success <- (tbl[[1]] +tbl[[4]])/(tbl[[1]] +tbl[[4]]+tbl[[2]]+tbl[[3]])

      if (success > tp)
      {
        tp <- success
        ind <- i
        gene <- names(trn)[i]
      }
    }
    
  }
  idx <- c(idx,ind)
  res <- rbind(res, data.frame(Iteration=j,Success=tp*100,Gene=gene))
}

I am no expert when it comes to programming so I am not sure how can I
optimize my relatively primitive code in the best way...

Thanks,
Moriah





--
View this message in context: http://r.789695.n4.nabble.com/Using-multicores-in-R-tp4651808p4652034.html
Sent from the R help mailing list archive at Nabble.com.