Rmpi working with OpenMPI and PBSPro but snow fails
On Wed, 2009-03-04 at 07:49 -0600, luke at stat.uiowa.edu wrote:
On Wed, 4 Mar 2009, Huw Lynes wrote:
Hi Luke, Thanks for the quick response.
Moving onto snow in the same environment trying to setup by using getMPICluster() returns an error in checkCluster() saying that there is something wrong with the cluster.
I don't know what "Moving to snow" means exactly as you don't give details of you you are starting things up so I have to guess. If you are using mpiexec then you need to run snow via the RMPISNOW shell script, which for NPROCS sets up a master and a cluster with NPROCS - 1 workers, and then use cl <- makeCluster() to access the already running cluster.
If I take the following trivial R script:
------------------------------------------------------------------------
library(Rmpi)
library(snow)
cl <- makeCluster()
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
stopCluster(cl)
------------------------------------------------------------------------
and run it as
------------------------------------------------------------------------
#!/bin/bash
#PBS -q SMP_queue
#PBS -l select=1:ncpus=4:mpiprocs=4
#PBS -l place=scatter:excl
module load apps/R
module load libs/R-mpi
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE
mpiexec RMPISNOW -f snowtest_solo.r
-----------------------------------------------------------------------
all the R processes just sit there spinning rather than doing anything
useful and I have to kill the job.
the suggestion in this mail:
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-January/000069.html
results in the same problem of R spinning. I suspect that there is
something different about my OpenMPI setup that means snow is failing to
set up a master process. So you end up with all four processes as slaves
spinning on a network poll.
Huw Lynes | Advanced Research Computing
HEC Sysadmin | Cardiff University
| Redwood Building,
Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB