Skip to content

Rmpi::mpi.spawn.Rslaves() stalls

2 messages · Cédric Lachat, Ei-ji Nakama

#
Hello,

I have the same issue.

I launch:
gdb Rscript

I give as argument my R script.
Then, I have this output, and program stalls:
...
Be patient, lcmm is running ...
??? 4 slaves are spawned successfully. 0 failed.

I kill all R:
killall R

On gdb, I have this backtrace:
#0? 0x00007ffff733974d in poll () at ../sysdeps/unix/syscall-template.S:84
#1? 0x00007ffff1438e58 in ?? () from /usr/lib/libopen-pal.so.13
#2? 0x00007ffff142f6fb in opal_libevent2021_event_base_loop () from
/usr/lib/libopen-pal.so.13
#3? 0x00007ffff13f9238 in opal_progress () from /usr/lib/libopen-pal.so.13
#4? 0x00007ffff1b3df65 in ompi_request_default_wait_all () from
/usr/lib/libmpi.so.12
#5? 0x00007ffff1b801fb in ompi_dpm_base_disconnect_waitall () from
/usr/lib/libmpi.so.12
#6? 0x00007fffec44544a in ?? () from
/usr/lib/openmpi/lib/openmpi/mca_dpm_orte.so
#7? 0x00007ffff1b52ed0 in PMPI_Comm_disconnect () from /usr/lib/libmpi.so.12
#8? 0x00007ffff1dda969 in mpi_comm_disconnect (sexp_comm=<optimized
out>) at Rmpi.c:1078
#9? 0x00007ffff78f5d90 in ?? () from /usr/lib/R/lib/libR.so
#10 0x00007ffff792c4ff in Rf_eval () from /usr/lib/R/lib/libR.so
...

My program versions are:
Ubuntu 16.04
OpenMPI 1.10.2
R 3.2.3

However, on CentOS 7, with OpenMPI 3.1.0 and R 3.5.0, it works!

So, does the issue stem from MPI or Rmpi?

Regards,
C?dric.
#
2018-07-05 18:43 GMT+09:00 C?dric Lachat <cedric.lachat at u-bordeaux.fr>:
<snip>
Problems occurred in OpenMPI 1.8 to 2.x on the environment I know, I
am going back to 1.6.5 or using 3.0 or later.
Maybe chaos is made from OpenMPI.
-- 
Best Regards,