Skip to content

Running Rmpi/OpenMPI issues

4 messages · Ross Boylan, Tsai Li Ming

#
Hi,

I have R 3.0.3 and OpenMPI 1.6.5.
Snow: 0.3-13
Rmpi: 0.6-3

Here?s my test script:
library(snow)

nbNodes <- 4
cl <- makeCluster(nbNodes, "MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
mpi.quit()

And the mpirun command:
/opt/openmpi-1.6.5-intel/bin/mpirun -np 1 -H host1,host2,host3,host4 R --no-save < ~/test_mpi.R

Here?s the output:
Loading required package: Rmpi
	4 slaves are spawned successfully. 0 failed.
[[1]]
   nodename      machine 
?host1"     "x86_64" 

[[2]]
   nodename      machine 
?host1"     "x86_64" 

[[3]]
   nodename      machine 
?host1"     "x86_64" 

[[4]]
   nodename      machine 
?host1"     "x86_64"
I followed the instructions from:
http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
, specifically to use -np 1

1. Why is it not running on the rest of the nodes? I can see all 4 processes on host1 and no orted daemon running.

What should be the correct way to run this? 

I have also tested a working CPI using openmpi and is working.

2. mpi.quit() just hangs there.

=================

I have tried a rmpi example:
library(Rmpi) 
rk <- mpi.comm.rank(0)
sz <- mpi.comm.size(0)
name <- mpi.get.processor.name()
cat("Hello, rank", rk, "size", sz, "on", name, "\n")
mpi.quit()

$ /opt/openmpi-1.6.5-intel/bin/mpirun -np 4 -H host1,host2,host3,host4 R --no-save < ~/test_rmpi.r 

It hangs here:
1. Running with -np 2, hangs at the library(Rmpi), similar to -np 4

2. Running with -np 1, I can get a successful run

3. Running with -np 8 , I get an error:
--------------------------------------------------------------------------
mpirun has exited due to process rank 4 with PID 38992 on
node numaq1.1dn exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


Thanks!
4 days later
#
On Sat, 2014-03-22 at 09:51 +0800, Tsai Li Ming wrote:
Maybe this will help; my script to launch Rmpi is (originally all 1
line):
R_PROFILE_USER=~/KHC/sunbelt/Rmpiprofile
LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH
PATH=/home/ross/install/bin:/home/ross/install/lib64/R/bin:$PATH orterun
-x R_PROFILE_USER -x LD_LIBRARY_PATH -x PATH -hostfile
~/KHC/sunbelt/hosts --prefix /home/ross/install R --no-save -q

Observations:
1. If mpirun is not on the regular path, one must use --prefix to tell
it where to look.  Otherwise MPI won't find the program and won't be
able to launch remotely.

2. For running within those remote sessions you may need to set PATH and
LD_LIBRARY_PATH so stuff gets found.

3. I left out -np; when I used it I always set it to the actual number
of processes (my hosts file looks like host1 slots=4).  I thought np 1
would limit you to one process; evidently it doesn't.

4. Rmpi, and possibly snow, requires a special startup script that is
distributed with the package.  I used a modified version and set
R_PROFILE_USER and exported that variable with -x.

Ross Boylan
#
On 26 Mar, 2014, at 11:14 am, Ross Boylan <ross at biostat.ucsf.edu> wrote:

            
Thanks Ross,

I managed to get it up running by copying the Rprofile from the Rmpi package into ~/.Rprofile and by calling:
$ mpirun -np4 -H host1,host2,host3,host4 R ?-no-save < ~/test_rmpi.R

Here?s my R script:
library(Rmpi)
library(boot)

mpi.remote.exec(mpi.get.processor.name())
mpi.close.Rslaves()
mpi.quit()

But I didn?t try with Snow.
#
On Wed, 2014-03-26 at 11:32 +0800, Tsai Li Ming wrote:
You don't need to load Rmpi; the startup will already have done so.

Ross