Hello, I have the same issue. I launch: gdb Rscript I give as argument my R script. Then, I have this output, and program stalls: ... Be patient, lcmm is running ... ??? 4 slaves are spawned successfully. 0 failed. I kill all R: killall R On gdb, I have this backtrace: #0? 0x00007ffff733974d in poll () at ../sysdeps/unix/syscall-template.S:84 #1? 0x00007ffff1438e58 in ?? () from /usr/lib/libopen-pal.so.13 #2? 0x00007ffff142f6fb in opal_libevent2021_event_base_loop () from /usr/lib/libopen-pal.so.13 #3? 0x00007ffff13f9238 in opal_progress () from /usr/lib/libopen-pal.so.13 #4? 0x00007ffff1b3df65 in ompi_request_default_wait_all () from /usr/lib/libmpi.so.12 #5? 0x00007ffff1b801fb in ompi_dpm_base_disconnect_waitall () from /usr/lib/libmpi.so.12 #6? 0x00007fffec44544a in ?? () from /usr/lib/openmpi/lib/openmpi/mca_dpm_orte.so #7? 0x00007ffff1b52ed0 in PMPI_Comm_disconnect () from /usr/lib/libmpi.so.12 #8? 0x00007ffff1dda969 in mpi_comm_disconnect (sexp_comm=<optimized out>) at Rmpi.c:1078 #9? 0x00007ffff78f5d90 in ?? () from /usr/lib/R/lib/libR.so #10 0x00007ffff792c4ff in Rf_eval () from /usr/lib/R/lib/libR.so ... My program versions are: Ubuntu 16.04 OpenMPI 1.10.2 R 3.2.3 However, on CentOS 7, with OpenMPI 3.1.0 and R 3.5.0, it works! So, does the issue stem from MPI or Rmpi? Regards, C?dric.
Rmpi::mpi.spawn.Rslaves() stalls
2 messages · Cédric Lachat, Ei-ji Nakama
2018-07-05 18:43 GMT+09:00 C?dric Lachat <cedric.lachat at u-bordeaux.fr>: <snip>
So, does the issue stem from MPI or Rmpi?
Problems occurred in OpenMPI 1.8 to 2.x on the environment I know, I am going back to 1.6.5 or using 3.0 or later. Maybe chaos is made from OpenMPI. -- Best Regards,
Eiji NAKAMA <nakama (a) ki.rim.or.jp> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>