an opinion question
On Sat, Feb 4, 2012 at 5:25 PM, Hodgess, Erin <HodgessE at uhd.edu> wrote:
Hi everyone! Here is an opinion question please: ?when using R on a cluster, what is the best way to start please?
Hi, Erin The answer will depend on the kind of cluster you have--the interconnection technology. Ours used MPI (the OpenMPI libraries) and Rmpi package. On top of Rmpi, come various facilitator packages like R's own new "parallel" (an adaption of some parts of snow), the separate snow package, and convenience tools like snowFT or doParallel. I've felt that the best thing to do when getting started is to work with Rmpi itself, because errors are more likely to be understandable. But the proponents of snowFT argue that errors are less likely if you follow their advice. I'm accumulating lessons from the school of crashed programs here: http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Cluster:Main That refers to a collection of "working examples" of these, and you can do me a favor if you check them over and give me feedback on what is clear or unclear. For me, the most difficult thing has been understanding where the work of the OS, the cluster framework and R, divide from each other. But I'm getting closer to having reasonable writeups for several. The list of all the examples is just a Subversion source directory listing, http://winstat.quant.ku.edu/svn/hpcexample/trunk/ I would like to insert a "roadmap" message at the top of that page, but I have to do some work with the web server before that is allowed. http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex50-R-serial/ Runs one R job (a single program) on the cluster http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex51-R-ManySerialJobs/ Sends many separate R jobs out into the cluster http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex53-HelloWorldRmpi/ Basics of Rmpi usage http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex60-HelloWorldSnow/ Shows similar with the snow package http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex61-HelloWorldSnowFT/ If you wonder what snowFT does differently, see the README http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex65-R-parallel/ R 2.14 introduced the parallel package and this tests that out. http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/ Do you need separate seeds within each run of a simulation? This helps by creating a seed archive file that the repetitions can draw on. http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex80-PrevSci2007/ This is inspired by a negative reaction I had to a published paper. I'll replicate that paper, see what's right, what's wrong. It has notes and advice for my class about re-designing an ordinary "run in one system" R simulation program into a "run across the cluster" program. This steps through versions.
Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas