Running Rmpi/OpenMPI issues
On 26 Mar, 2014, at 11:14 am, Ross Boylan <ross at biostat.ucsf.edu> wrote:
On Sat, 2014-03-22 at 09:51 +0800, Tsai Li Ming wrote:
Hi,
I have R 3.0.3 and OpenMPI 1.6.5.
Snow: 0.3-13
Rmpi: 0.6-3
Here?s my test script:
library(snow)
nbNodes <- 4
cl <- makeCluster(nbNodes, "MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
mpi.quit()
And the mpirun command:
/opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save < ~/test_mpi.R
Maybe this will help; my script to launch Rmpi is (originally all 1 line): R_PROFILE_USER=~/KHC/sunbelt/Rmpiprofile LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH PATH=/home/ross/install/bin:/home/ross/install/lib64/R/bin:$PATH orterun -x R_PROFILE_USER -x LD_LIBRARY_PATH -x PATH -hostfile ~/KHC/sunbelt/hosts --prefix /home/ross/install R --no-save -q Observations: 1. If mpirun is not on the regular path, one must use --prefix to tell it where to look. Otherwise MPI won't find the program and won't be able to launch remotely. 2. For running within those remote sessions you may need to set PATH and LD_LIBRARY_PATH so stuff gets found. 3. I left out -np; when I used it I always set it to the actual number of processes (my hosts file looks like host1 slots=4). I thought np 1 would limit you to one process; evidently it doesn't. 4. Rmpi, and possibly snow, requires a special startup script that is distributed with the package. I used a modified version and set R_PROFILE_USER and exported that variable with -x. Ross Boylan
Thanks Ross, I managed to get it up running by copying the Rprofile from the Rmpi package into ~/.Rprofile and by calling: $ mpirun -np4 -H host1,host2,host3,host4 R ?-no-save < ~/test_rmpi.R Here?s my R script: library(Rmpi) library(boot) mpi.remote.exec(mpi.get.processor.name()) mpi.close.Rslaves() mpi.quit() But I didn?t try with Snow.
Here?s the output:
cl <- makeCluster(nbNodes, "MPI")
Loading required package: Rmpi 4 slaves are spawned successfully. 0 failed.
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
[[1]] nodename machine ?host1" "x86_64" [[2]] nodename machine ?host1" "x86_64" [[3]] nodename machine ?host1" "x86_64" [[4]] nodename machine ?host1" "x86_64"
mpi.quit()
I followed the instructions from: http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf , specifically to use -np 1 1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running. What should be the correct way to run this? I have also tested a working CPI using openmpi and is working. 2. mpi.quit() just hangs there. ================= I have tried a rmpi example: library(Rmpi) rk <- mpi.comm.rank(0) sz <- mpi.comm.size(0) name <- mpi.get.processor.name() cat("Hello, rank", rk, "size", sz, "on", name, "\n") mpi.quit() $ /opt/openmpi-1.6.5-intel/bin/mpirun -np 4 -H host1,host2,host3,host4 R --no-save < ~/test_rmpi.r It hangs here:
library(Rmpi) # calls MPI_Init
1. Running with -np 2, hangs at the library(Rmpi), similar to -np 4 2. Running with -np 1, I can get a successful run 3. Running with -np 8 , I get an error:
library(Rmpi) # calls MPI_Init
-------------------------------------------------------------------------- mpirun has exited due to process rank 4 with PID 38992 on node numaq1.1dn exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- Thanks!
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc