snow/snowfall: advantages of MPI over the socket method?
Also, suppose I execute R instances of RMPISNOW via mpirun on a node (something along the line of "mpirun -np3 --host myhost RMPISNOW). using "ps augx", i see that there are 2 R instances started (not including the master process). In RMPISNOW, executing sfInit(cpus=2, parallel=TRUE, type="MPI") opens two more. The following is a snippet from ps augx: vqnguyen 12481 97.2 0.4 315320 71108 ? R 16:24 0:49 /apps/R/2.10.0/lib64/R/bin/exec/R --slave --no-restore --file=/apps/R/2.10.0/lib64/R/library/snow/RMPInode.R --args SNOWLIB=/apps/R/2.10.0/lib64/R/library OUT=/dev/null vqnguyen 12480 97.2 0.4 313088 68872 ? R 16:24 0:49 /apps/R/2.10.0/lib64/R/bin/exec/R --slave --no-restore --file=/apps/R/2.10.0/lib64/R/library/snow/RMPInode.R --args SNOWLIB=/apps/R/2.10.0/lib64/R/library OUT=/dev/null vqnguyen 12467 74.6 0.1 258704 29844 ? S 16:22 2:33 /apps/R/2.10.0/lib64/R/bin/exec/R --no-save vqnguyen 12470 74.6 0.1 258700 29844 ? S 16:22 2:33 /apps/R/2.10.0/lib64/R/bin/exec/R --no-save The first two are spawned after sfInit() is ran. Is this how things should look like? I was expecting only two slaves. Can anyone confirm? Thanks. Vinh
On Wed, Jan 6, 2010 at 3:22 PM, Vinh Nguyen <vqnguyen at uci.edu> wrote:
Dear R-HPC list, I've been using snowfall via the socket method quite a bit the last year for a lot of my simulation studies. ?The cluster I work on has SGE and I've been extracting information from the "qhost" command to find idle nodes to spawn R instances via sfInit(). ?Yes, I know this may turn some heads for system admins but I've been diligent in not hogging the shared resources -- this workflow has served me well. I've also been working with a system admin at my school to try to get OpenMPI running with snow/snowfall. ?At first, we were going to go with the LAM/MPI implementation for use with the "sfCluster" unix program but decided to go with OpenMPI since it is being actively developed (and is possible to work with SGE). ?I think things are working. ?I am aware that you can run batch programs using this method, but I prefer to run the interactive session (via mpirun with RMPISNOW). Some things I noticed that are different compared to the socket method: 1. ?I declare the number of cpus and nodes before my R session is launched via the mpirun command. ?This launches that many R instances on the different nodes (master + workers/slaves). 2. ?If there is an error in the R code (eg, object not found), the R session terminates. 3. ?I use sfInit() to declare the number of cpus to use from the cluster created in 1. Are there any advantages of the MPI method over the socket method? The only advantages I see are to be able to use sfCluster with LAM/MPI to select idle nodes from step 1 or to use mpirun/RMPISNOW with SGE to manage the shared resources. ?Aside from these, my experience is that the socket method is the easier and even better (comment 2) method. Please let me know otherwise. Thanks. Vinh