HPC with standard R functions - R-SIG-HPC

Simone Ruzza · 2013-09-28T18:19:16Z

Dear list, apologies for the total beginner's question, but I am very new to HPC. I am confronted with a large data analysis job that requires using functions available for contributed packages, that I did not write myself. I would like to speed up the process of analysis and I am considering parallel computing or a cluster. As far as I understand, it is that it is not it is always possible to parallelize R code to be executed on a cluster. This depends on the computing task i.e. whether it is

Brian G. Peterson

Sat, Sep 28, 2013 1:40 PM #

On 09/28/2013 01:19 PM, Simone Ruzza wrote:

I'm not sure you've told us enough to answer you.

If your task is repetitive (such as Monte Carlo analysis), then the 
answer is most likely yes.

If your data can be partitioned, and your model can be fit on the 
partitions, then the answer is most likely yes, you can parallelize it.

If your model can be partitioned, so that some or all of the 
sub-functions from other packages that you mention can be called in 
parallel on your large data, then the answer is most likely yes.

In terms of technology to use, at this point you'd have to tell us about 
the cluster you want to run it on, which would then help us decide 
whether you should be looking at 'parallel',now part of base R, 
'foreach' which has what I believe to be the very nice property of 
writing code that can use any or no parallel backends without changing 
your code, or something very specific like Rmpi because the cluster you 
hope to use uses that as its parallel backend. (there are other possible 
endpoints too, but these seem to be the most popular)

But from what I read above, you haven't given us enough detail about 
what you need to do for me at least to say anything definitive.

Regards,

Brian

Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock