Skip to content

Rmpi: mpi.close.Rslaves() 'hangs'

2 messages · Marius Hofert, Ei-ji Nakama

#
Hi Ei-ji,

Thanks for your help.

You lost me a bit... Here is what I got:

1) I can confirm that I have Open MPI 2.1.1 (mpirun --version), so
that is most likely the source of the problem (and the mpi version
probably changed since I last used Rmpi and move to different
hardware).

2) As I understand, you suggest to use Rmpi's mpi.comm.free(comm)
instead of mpi.comm.disconnect(comm). I thus adapted
mpi.close.Rslave() (which 'hangs') to always call mpi.comm.free().
More precisely, I defined mpi.close.Rslave2() which is changed in the
last part in comparison to mpi.close.Rslave():

    if (comm > 0) {
        ## Changed (as it 'hangs' in openmpi-2.x):
        ## if (is.loaded("mpi_comm_disconnect"))
        ##     mpi.comm.disconnect(comm)
        ## else
        mpi.comm.free(comm)
    }

If I execute the minimal working example with this new
mpi.close.Rslave2() at the end, something strange happens: *While*
doing the computation, 'htop' doesn't show the two cores separately,
but *after* executing it, the two cores show up and I need to manually
'kill -9 <PID>' them.

Any ideas?

As a 'user' (not: maintainer), I don't think there's much I can do. I
feel this needs to be addressed by the maintainer of Rmpi (as opposed
to me hacking into mpi.close.Rslave()). I contacted the maintainer of
Rmpi again, but still no response.

Thanks & cheers,
Marius
On Thu, Sep 28, 2017 at 6:55 AM, Ei-ji Nakama <nakama at ki.rim.or.jp> wrote:
#
Hi,

2017-09-28 15:48 GMT+09:00 Marius Hofert <marius.hofert at uwaterloo.ca>:
Look at the modifications made to Rmpi/inst/slavedaemon.R
MPI_Comm_disconnect is looping even on slave ...