Skip to content

Rmpi and mpirun

3 messages · Ei-ji Nakama, Vincent Boucher

#
Hi,

I'm running into an issue using Rmpi with Open MPI on a beowulf cluster.
The installation of the package went without any issue. I have done the
following:
'mpirun -n 1 --hostfile hostfile.txt R --interactive'
then, 'library(Rmpi)

when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is
supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
the "not enough slots available..." message.
It looks like when R opens, the nodes are already up and running (due to
the mpirun) so mpi.spawn fails... I've tried to launch R directly (without
mpirun) but then, I only get 1 node...

Am I missing something?

many thanks

Vincent
1 day later
#
Hello,

When you debug the OpenMPI process...
Read the result of the following command
$ ompi_info --param btl base --level 9

Maybe first time...try following command
$ mpirun --mca btl_base_verbose 40 -np 1 R --interactive
----<write script>----

Debugging parameter file can also be written below
$ mkdir -p ~/.openmpi
$ echo "btl_base_verbose = 40" > ~/.openmpi/mca-params.conf



2018-07-12 5:31 GMT+09:00 Vincent Boucher <vincent.boucher.u at gmail.com>:
Since you have already started the MPI master process,
          `mpi.universe.size() - 1'
will be the number of slaves that can be activated.
See below, orte_default_hostfile
ompi_info --params orte all --level 9

????
   $ echo 'orte_default_hostfile = "~/hostfile.txt"' >>
~/.openmpi/mca-params.conf

For the host file format, refer to the following.
https://www.open-mpi.org/doc/v3.0/man7/orte_hosts.7.php
Best Regards,
#
Hi,

thanks for the suggestion. Nothing I could find informative in the debug
output (or at least, that I could interpret). However, it made me think of
playing around with the options a bit... I think I found the issue (i.e. it
works at the moment!). I'm not sure why, but I post it here in case someone
has a similar issue.
I didn't discuss the setup, but I have a beowulf type cluster (a bunch of
old computers linked through Ethernet cables). The thing is that open mpi
first tries to find infiniband connections first (which obviously, I don't
have). Normally this is not an issue (at least it isn't for my Fortran90
codes) but it seems to screw with Rmpi... running the -mca btl ^openib as
in : mpirun -n 1 -mca btl ^openib --hostfile hostfile.txt R --interactive
Solves the problem...
It is weird since R, or mpirun, does not really issue any error related...

anyway, many thanks !

Vincent
On Fri, Jul 13, 2018 at 1:17 AM Ei-ji Nakama <nakama at ki.rim.or.jp> wrote: