Skip to content

Rmpi_0.5-4 and OpenMPI questions

6 messages · Dirk Eddelbuettel, Hao Yu, Luke Tierney

#
Many thanks to Dr Yu for updating Rmpi for R 2.6.0, and for starting to make
the changes to support Open MPI.

I have just built the updated Debian package of Rmpi (i.e. r-cran-rmpi) under
R 2.6.0 but I cannot convince myself yet whether it works or not.  Simple
tests work.  E.g. on my Debian testing box, with Rmpi installed directly
using Open Mpi 1.2.3-2 (from Debian) and using 'r' from littler:

edd at ron:~> orterun -np 3 r -e 'library(Rmpi); print(mpi.comm.rank(0))'
[1] 0
[1] 1
[1] 2
edd at ron:~>  

but I basically cannot get anything more complicated to work yet.  R / Rmpi
just seem to hang, in particular snow and and getMPIcluster() just sit there:
Error in makeMPIcluster(n = 3) : no nodes available.
I may be overlooking something simple here, in particular the launching of
apps appears to be different for Open MPI than it was with LAM/MPI (or maybe
I am just confused because I also look at LLNL's slurm for use with Open MPI ?)

Has anybody gotten Open MPI and Rmpi to work on simple demos?  Similarly, is
anybody using snow with Rmpi and Open MPI yet?

Also, the Open MPI FAQ is pretty clear on their preference for using mpicc
for compiling/linking to keep control of the compiler and linker options and
switches.  Note that e.g. on my Debian system

edd at ron:~> mpicc --showme:link
-pthread -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl

whereas Rmpi built with just the default from R CMD:

gcc-4.2 -std=gnu99 -shared  -o Rmpi.so RegQuery.o Rmpi.o conversion.o internal.o -L/usr/lib -lmpi -lpthread -fPIC   -L/usr/lib/R/lib -lR

Don't we need libopen-rte and libopen-pal as the MPI FAQ suggests?

Many thanks, Dirk
#
Hi Dirk,

Thank for pointing out additional flags needed in order to compile Rmpi
correctly. Those flags can be added in configure.ac once openmpi dir is
detected. BTW -DMPI2 flag was missed in your Rmpi since the detection of
openmpi was not good. It should be
####
        if test -d  ${MPI_ROOT}/lib/openmpi; then
                echo "Found openmpi dir in ${MPI_ROOT}/lib"
                MPI_DEPS="-DMPI2"
        fi
####

I tried to run Rmpi under snow and got the same error messenger. But after
checking makeMPIcluster, I found that n=3 was a wrong argument. After
makeMPIcluster finds that count is missing,
count=mpi.comm.size(0)-1 is used. If you start R alone, this will return
count=0 since there is only one member (master). I do not know why snow
did not use count=mpi.universe.size()-1 to find total nodes available.
Anyway after using
cl=makeMPIcluster(count=3),
I was able to run parApply function.

I tried
R -> library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)

Also
mpirun -host hostfile -np 1 R --no-save
library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)

Hao

PS: hostfile contains all nodes info so in R mpi.universe.size() returns
right number and will spawn to remote nodes.

Rmp under Debian 3.1 and openmpi 1.2.4 seems OK. I did find some missing
lib under Debian 4.0.
Dirk Eddelbuettel wrote:

  
    
#
On 4 October 2007 at 01:11, Hao Yu wrote:
| Hi Dirk,
| 
| Thank for pointing out additional flags needed in order to compile Rmpi
| correctly. Those flags can be added in configure.ac once openmpi dir is
| detected. BTW -DMPI2 flag was missed in your Rmpi since the detection of
| openmpi was not good. It should be
| ####
|         if test -d  ${MPI_ROOT}/lib/openmpi; then
|                 echo "Found openmpi dir in ${MPI_ROOT}/lib"
|                 MPI_DEPS="-DMPI2"
|         fi
| ####

I don't follow. From my build log:

* Installing *source* package 'Rmpi' ...
[...]
checking for gcc option to accept ISO C89... none needed
I am here /usr
Try to find mpi.h ...
Found in /usr/include
Try to find libmpi or libmpich ...
Found libmpi in /usr/lib
Found openmpi dir in /usr/lib       <---------- found openmpi
[...]
** libs
make[1]: Entering directory `/tmp/buildd/rmpi-0.5-4/src'
gcc-4.2 -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -I/usr/include -DMPI2 -fPIC     -fpic  -g -O2 -c RegQuery.c -o RegQuery.o
[...]

so -DMPI2 is used.

Because I build this in a chroot / pbuilder envinronment, neither LAM nor
MPICH2 are installed and Open MPI is detected. 

| I tried to run Rmpi under snow and got the same error messenger. But after
| checking makeMPIcluster, I found that n=3 was a wrong argument. After
| makeMPIcluster finds that count is missing,

Yes, my bad. But it also hangs with argument count=3 (which I had tried, but
my mail was wrong.)

| count=mpi.comm.size(0)-1 is used. If you start R alone, this will return
| count=0 since there is only one member (master). I do not know why snow
| did not use count=mpi.universe.size()-1 to find total nodes available.

How would it know total nodes ?  See below re hostfile.

| Anyway after using
| cl=makeMPIcluster(count=3),
| I was able to run parApply function.
| 
| I tried
| R -> library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| 
| Also
| mpirun -host hostfile -np 1 R --no-save
| library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| 
| Hao
| 
| PS: hostfile contains all nodes info so in R mpi.universe.size() returns
| right number and will spawn to remote nodes.

So we depend on a correct hostfile ?   As I understand the Open MPI this is
deprecated:

# This is the default hostfile for Open MPI.  Notice that it does not
# contain any hosts (not even localhost).  This file should only
# contain hosts if a system administrator wants users to always have
# the same set of default hosts, and is not using a batch scheduler
# (such as SLURM, PBS, etc.).

I am _very_ interested in running Open MPI and Rmpi under slurm (which we
added to Debian as source package slurm-llnl) so it would be nice if this
could rewritten to not require a hostfile as this seems to be how upstream is
going. 

| Rmp under Debian 3.1 and openmpi 1.2.4 seems OK. I did find some missing
| lib under Debian 4.0.

Can you be more specifi? I'd be glad to help.

Thanks!

Dirk


| 
|
| Dirk Eddelbuettel wrote:
| >
| > Many thanks to Dr Yu for updating Rmpi for R 2.6.0, and for starting to
| > make
| > the changes to support Open MPI.
| >
| > I have just built the updated Debian package of Rmpi (i.e. r-cran-rmpi)
| > under
| > R 2.6.0 but I cannot convince myself yet whether it works or not.  Simple
| > tests work.  E.g. on my Debian testing box, with Rmpi installed directly
| > using Open Mpi 1.2.3-2 (from Debian) and using 'r' from littler:
| >
| > edd at ron:~> orterun -np 3 r -e 'library(Rmpi); print(mpi.comm.rank(0))'
| > [1] 0
| > [1] 1
| > [1] 2
| > edd at ron:~>
| >
| > but I basically cannot get anything more complicated to work yet.  R /
| > Rmpi
| > just seem to hang, in particular snow and and getMPIcluster() just sit
| > there:
| >
| >> cl <- makeSOCKcluster(c("localhost", "localhost"))
| >> stopCluster(cl)
| >> library(Rmpi)
| >> cl <- makeMPIcluster(n=3)
| > Error in makeMPIcluster(n = 3) : no nodes available.
| >>
| >
| > I may be overlooking something simple here, in particular the launching of
| > apps appears to be different for Open MPI than it was with LAM/MPI (or
| > maybe
| > I am just confused because I also look at LLNL's slurm for use with Open
| > MPI ?)
| >
| > Has anybody gotten Open MPI and Rmpi to work on simple demos?  Similarly,
| > is
| > anybody using snow with Rmpi and Open MPI yet?
| >
| > Also, the Open MPI FAQ is pretty clear on their preference for using mpicc
| > for compiling/linking to keep control of the compiler and linker options
| > and
| > switches.  Note that e.g. on my Debian system
| >
| > edd at ron:~> mpicc --showme:link
| > -pthread -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
| > -lutil -lm -ldl
| >
| > whereas Rmpi built with just the default from R CMD:
| >
| > gcc-4.2 -std=gnu99 -shared  -o Rmpi.so RegQuery.o Rmpi.o conversion.o
| > internal.o -L/usr/lib -lmpi -lpthread -fPIC   -L/usr/lib/R/lib -lR
| >
| > Don't we need libopen-rte and libopen-pal as the MPI FAQ suggests?
| >
| > Many thanks, Dirk
| >
| > --
| > Three out of two people have difficulties with fractions.
| >
| 
| 
| -- 
| Department of Statistics & Actuarial Sciences
| Fax Phone#:(519)-661-3813
| The University of Western Ontario
| Office Phone#:(519)-661-3622
| London, Ontario N6A 5B7
| http://www.stats.uwo.ca/faculty/yu
#
On Thu, 4 Oct 2007, Dirk Eddelbuettel wrote:

            
Any chance the snow workers are picking up another version of Rmpi, eg
a LAM one?  Might happen if you have R_SNOW_LIB set and a Rmpi
installed there.  Otherwise starting with outfile=something may help.
Let me know what you find out -- I'd like to make the snow
configuration process more bullet-proof.
To work better with batch scheduling environments where spawning might
be techncally or politically problematic I have been trying to improve
the RMPISNOW script that can be used with LAM as

     mpirun -np 3 RMPISNOW

and then either

     cl <- makeCluster()  # no argument

or

     cl <- makeCluster(2) # mpi rank - 1 (or less I believe)

(the default type for makeCluster becomes MPI in this case).  This
seems to work reasonably well in LAM and I think I can get it to work
similarly in OpenMPI -- will try in the next day or so.  Both LAM and
OpenMPI provide environment variables so shell scripts can determine
the mpirank, which is useful for getting --slave and output redirect
to the workers.  I haven't figured out anything analogous for
MPIC/MPICH2 yet.

Best,

luke

  
    
#
On 4 October 2007 at 06:37, Luke Tierney wrote:
| > Yes, my bad. But it also hangs with argument count=3 (which I had tried, but
| > my mail was wrong.)
| 
| Any chance the snow workers are picking up another version of Rmpi, eg
| a LAM one?  Might happen if you have R_SNOW_LIB set and a Rmpi
| installed there.  Otherwise starting with outfile=something may help.
| Let me know what you find out -- I'd like to make the snow
| configuration process more bullet-proof.

I generally don;t have any environment variables, so not sure. I'll try to
see what I can find.

| > | count=mpi.comm.size(0)-1 is used. If you start R alone, this will return
| > | count=0 since there is only one member (master). I do not know why snow
| > | did not use count=mpi.universe.size()-1 to find total nodes available.
| >
| > How would it know total nodes ?  See below re hostfile.
| >
| > | Anyway after using
| > | cl=makeMPIcluster(count=3),
| > | I was able to run parApply function.
| > |
| > | I tried
| > | R -> library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Also
| > | mpirun -host hostfile -np 1 R --no-save
| > | library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Hao
| > |
| > | PS: hostfile contains all nodes info so in R mpi.universe.size() returns
| > | right number and will spawn to remote nodes.
| >
| > So we depend on a correct hostfile ?   As I understand the Open MPI this is
| > deprecated:
| >
| > # This is the default hostfile for Open MPI.  Notice that it does not
| > # contain any hosts (not even localhost).  This file should only
| > # contain hosts if a system administrator wants users to always have
| > # the same set of default hosts, and is not using a batch scheduler
| > # (such as SLURM, PBS, etc.).
| >
| > I am _very_ interested in running Open MPI and Rmpi under slurm (which we
| > added to Debian as source package slurm-llnl) so it would be nice if this
| > could rewritten to not require a hostfile as this seems to be how upstream is
| > going.
| 
| To work better with batch scheduling environments where spawning might
| be techncally or politically problematic I have been trying to improve
| the RMPISNOW script that can be used with LAM as
| 
|      mpirun -np 3 RMPISNOW
| 
| and then either
| 
|      cl <- makeCluster()  # no argument
| 
| or
| 
|      cl <- makeCluster(2) # mpi rank - 1 (or less I believe)
| 
| (the default type for makeCluster becomes MPI in this case).  This
| seems to work reasonably well in LAM and I think I can get it to work
| similarly in OpenMPI -- will try in the next day or so.  Both LAM and
| OpenMPI provide environment variables so shell scripts can determine
| the mpirank, which is useful for getting --slave and output redirect
| to the workers.  I haven't figured out anything analogous for
| MPIC/MPICH2 yet.

Yes, out of a run I also realized that I can't just ask Rmpi to work without
a hostfile -- the info must come from somewhere.  

That said, it still fails with a minimal slurm example using the srun. Ie

edd at ron:~> cat /tmp/rmpi.r
#!/usr/bin/env r
library(Rmpi)
library(snow)
cl <- makeMPIcluster(count=1)
print("Hello\n")

does not make it through makeMPIcluster either and just hangs if I do:

edd at ron:~> srun -N 1 /tmp/rmpi.r
                                

Dirk
#
On Thu, 4 Oct 2007, Hao Yu wrote:

            
The bit of code you are looking at, for handling calls with no count
argument, is for the case where workers have been started by mpirun
with the RMPISNOW script rather than spawing.  Using
mpi.universe.size() to guess a reasonable default choice for the
spawning case might be useful -- will look into that.

I have OpenMPI installed on Fedora 7 x84_64.  Rmpi 0.5-4 configure
fails for me -- it does not find mpi.h.  I can get Rmpi to build if I
manually set these in Makevars:

     PKG_CFLAGS   = $(ARCHCFLAGS) -I/usr/include/openmpi
     PKG_LIBS     = -L/usr/lib64/openmpi -L/lib -lmpi -lpthread -fPIC $(ARCHLIB)

When I try to use R -> library(Rmpi) -> library(snow) cl <- makeMPIcluster(2)
or mpirun -np3 R -> library(Rmpi) -> ... I get

     Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
     Failing at addr:0x1d26ab7
     [0] func:/usr/lib64/openmpi/libopal.so.0 [0x2aaaafee3263]
     [1] func:/lib64/libc.so.6 [0x367f030630]
     [2] func:/usr/lib64/openmpi/libmpi.so.0(ompi_fortran_string_f2c+0x8c) [0x2aaaaf813bcc]
     [3] func:/usr/lib64/openmpi/libmpi.so.0(mpi_comm_spawn_f+0x75) [0x2aaaaf816405]
     ...
     *** End of error message ***
     Segmentation fault
     luke at nokomis ~%

But mpirun -np 3 RMPISNOW does seem to work, more or less.  A modified
version of RMPISNOW, hopefully attached, does a better job of getting
sensible arguments to the workers and master, but the master R still
thinks it is non-interactive.  I have not figured out a work-around
for that yet -- suggestions welcome.

Best,

luke