HPC with standard R functions
On 09/28/2013 01:19 PM, Simone Ruzza wrote:
apologies for the total beginner's question, but I am very new to HPC. I am confronted with a large data analysis job that requires using functions available for contributed packages, that I did not write myself. I would like to speed up the process of analysis and I am considering parallel computing or a cluster. As far as I understand, it is that it is not it is always possible to parallelize R code to be executed on a cluster. This depends on the computing task i.e. whether it is iterative. My question is: is it possible to speed up the execution time of a function (e.g. some model fitting function), which includes low-level functions? I am not looking for any solutions that I have already found on the web that show for example, how to use the snowfall package (e.g. use sfLapply) to perform an iterative task. In my case it appears that I would have to re-write a large amount of code myself, which to me seems to be equivalent to re-inventing the wheel. Apologies for the generality of my question, due to my ignorance on the subject. Any help would be greatly appreciated!
I'm not sure you've told us enough to answer you. If your task is repetitive (such as Monte Carlo analysis), then the answer is most likely yes. If your data can be partitioned, and your model can be fit on the partitions, then the answer is most likely yes, you can parallelize it. If your model can be partitioned, so that some or all of the sub-functions from other packages that you mention can be called in parallel on your large data, then the answer is most likely yes. In terms of technology to use, at this point you'd have to tell us about the cluster you want to run it on, which would then help us decide whether you should be looking at 'parallel',now part of base R, 'foreach' which has what I believe to be the very nice property of writing code that can use any or no parallel backends without changing your code, or something very specific like Rmpi because the cluster you hope to use uses that as its parallel backend. (there are other possible endpoints too, but these seem to be the most popular) But from what I read above, you haven't given us enough detail about what you need to do for me at least to say anything definitive. Regards, Brian
Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock