Hi, I'm running into an issue using Rmpi with Open MPI on a beowulf cluster. The installation of the package went without any issue. I have done the following: 'mpirun -n 1 --hostfile hostfile.txt R --interactive' then, 'library(Rmpi) when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get the "not enough slots available..." message. It looks like when R opens, the nodes are already up and running (due to the mpirun) so mpi.spawn fails... I've tried to launch R directly (without mpirun) but then, I only get 1 node... Am I missing something? many thanks Vincent
Rmpi and mpirun
3 messages · Ei-ji Nakama, Vincent Boucher
1 day later
Hello, When you debug the OpenMPI process... Read the result of the following command $ ompi_info --param btl base --level 9 Maybe first time...try following command $ mpirun --mca btl_base_verbose 40 -np 1 R --interactive ----<write script>---- Debugging parameter file can also be written below $ mkdir -p ~/.openmpi $ echo "btl_base_verbose = 40" > ~/.openmpi/mca-params.conf 2018-07-12 5:31 GMT+09:00 Vincent Boucher <vincent.boucher.u at gmail.com>:
Hi, I'm running into an issue using Rmpi with Open MPI on a beowulf cluster. The installation of the package went without any issue. I have done the following: 'mpirun -n 1 --hostfile hostfile.txt R --interactive' then, 'library(Rmpi) when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
Since you have already started the MPI master process,
`mpi.universe.size() - 1'
will be the number of slaves that can be activated.
the "not enough slots available..." message. It looks like when R opens, the nodes are already up and running (due to the mpirun) so mpi.spawn fails... I've tried to launch R directly (without mpirun) but then, I only get 1 node...
See below, orte_default_hostfile ompi_info --params orte all --level 9 ???? $ echo 'orte_default_hostfile = "~/hostfile.txt"' >> ~/.openmpi/mca-params.conf For the host file format, refer to the following. https://www.open-mpi.org/doc/v3.0/man7/orte_hosts.7.php
Am I missing something?
many thanks
Vincent
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Best Regards,
Eiji NAKAMA <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
Hi, thanks for the suggestion. Nothing I could find informative in the debug output (or at least, that I could interpret). However, it made me think of playing around with the options a bit... I think I found the issue (i.e. it works at the moment!). I'm not sure why, but I post it here in case someone has a similar issue. I didn't discuss the setup, but I have a beowulf type cluster (a bunch of old computers linked through Ethernet cables). The thing is that open mpi first tries to find infiniband connections first (which obviously, I don't have). Normally this is not an issue (at least it isn't for my Fortran90 codes) but it seems to screw with Rmpi... running the -mca btl ^openib as in : mpirun -n 1 -mca btl ^openib --hostfile hostfile.txt R --interactive Solves the problem... It is weird since R, or mpirun, does not really issue any error related... anyway, many thanks ! Vincent
On Fri, Jul 13, 2018 at 1:17 AM Ei-ji Nakama <nakama at ki.rim.or.jp> wrote:
Hello, When you debug the OpenMPI process... Read the result of the following command $ ompi_info --param btl base --level 9 Maybe first time...try following command $ mpirun --mca btl_base_verbose 40 -np 1 R --interactive ----<write script>---- Debugging parameter file can also be written below $ mkdir -p ~/.openmpi $ echo "btl_base_verbose = 40" > ~/.openmpi/mca-params.conf 2018-07-12 5:31 GMT+09:00 Vincent Boucher <vincent.boucher.u at gmail.com>:
Hi, I'm running into an issue using Rmpi with Open MPI on a beowulf cluster. The installation of the package went without any issue. I have done the following: 'mpirun -n 1 --hostfile hostfile.txt R --interactive' then, 'library(Rmpi) when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
Since you have already started the MPI master process,
`mpi.universe.size() - 1'
will be the number of slaves that can be activated.
the "not enough slots available..." message. It looks like when R opens, the nodes are already up and running (due to the mpirun) so mpi.spawn fails... I've tried to launch R directly
(without
mpirun) but then, I only get 1 node...
See below, orte_default_hostfile ompi_info --params orte all --level 9 ???? $ echo 'orte_default_hostfile = "~/hostfile.txt"' >> ~/.openmpi/mca-params.conf For the host file format, refer to the following. https://www.open-mpi.org/doc/v3.0/man7/orte_hosts.7.php
Am I missing something?
many thanks
Vincent
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Best Regards, -- Eiji NAKAMA <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>