Rmpi working with OpenMPI and PBSPro but snow fails
mpiexec -n 3 RMPISNOW -f snowtest_solo.r works for me with OpenMPI (openmpi-1.2.4-2.fc9.x86_64) and current snow. The RMPISNOW does try to identify the master to adjust the arguments but that shouldn't cause confusion about who is the master -- that is based on the rank. It may be that your profile file setting you mentioned is gettng inthe way as RMPISNOW uses the R_PROFILE environment variable to get the top level code into the processes. luke
On Wed, 4 Mar 2009, Huw Lynes wrote:
On Wed, 2009-03-04 at 07:49 -0600, luke at stat.uiowa.edu wrote:
On Wed, 4 Mar 2009, Huw Lynes wrote:
Hi Luke, Thanks for the quick response.
Moving onto snow in the same environment trying to setup by using getMPICluster() returns an error in checkCluster() saying that there is something wrong with the cluster.
I don't know what "Moving to snow" means exactly as you don't give details of you you are starting things up so I have to guess. If you are using mpiexec then you need to run snow via the RMPISNOW shell script, which for NPROCS sets up a master and a cluster with NPROCS - 1 workers, and then use cl <- makeCluster() to access the already running cluster.
If I take the following trivial R script:
------------------------------------------------------------------------
library(Rmpi)
library(snow)
cl <- makeCluster()
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
stopCluster(cl)
------------------------------------------------------------------------
and run it as
------------------------------------------------------------------------
#!/bin/bash
#PBS -q SMP_queue
#PBS -l select=1:ncpus=4:mpiprocs=4
#PBS -l place=scatter:excl
module load apps/R
module load libs/R-mpi
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE
mpiexec RMPISNOW -f snowtest_solo.r
-----------------------------------------------------------------------
all the R processes just sit there spinning rather than doing anything
useful and I have to kill the job.
the suggestion in this mail:
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-January/000069.html
results in the same problem of R spinning. I suspect that there is
something different about my OpenMPI setup that means snow is failing to
set up a master process. So you end up with all four processes as slaves
spinning on a network poll.
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu