Skip to content

Error message from MPI

3 messages · Jim Maas, Brian G. Peterson, Paul Johnson

#
Hi All,

I'm relatively new to this but it has worked well for me previously and
now I'm getting errors.

I'm attempting to run and R job, on a cluster using LSF operating
system, using the packages doMPI and foreach.  In the job file I've
requested 128 slots, and MPI gets that number quite successfully but is
giving this error message

=====================
Loading required package: Rmpi
Loading required package: Rmpi
--------------------------------------------------------------------------
[[51610,1],5]: A high-performance Open MPI point-to-point messaging
module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
   Host: cn004.private.dns.zone

Another transport will be used instead, although this may result in
lower performance.
=====================

As a test I've run the same job with a smaller number of slots, and it
runs fine when I run it on 32 or 64 slots, but when I increase to 128
slots, I get this error?

I guess I'm asking if the R packages doMPI and foreach are scalable, and
to what level?

Thanks for any suggestions.

Jim

Dr. Jim Maas
University of East Anglia
#
On Thu, 2011-08-11 at 12:40 +0100, Jim Maas wrote:
I suspect some limitation in the configuration on your cluster. (for
example, what is OpenFabrics and what does it have to do with MPI?)  You
may need to speak with your cluster administrators.

I've successfully used foreach and doMPI on over 250 worker nodes.

Unfortunately, I'm not sure how much additional help I can be beyond my
one data point, as I no longer have access to that particular cluster.

Regards,

   - Brian
1 day later
#
Hello, Jim

I have seen similar, but the jobs still run.  On my system, it happens
because our cluster has a mixture of fabrics.  Some are regular old
ethernet, some are infiniband. (suspect you are similar from openib
message).  When I submit jobs on the cluster, and they are sent to
nodes that are ethernet connected, the infiniband connector tries to
connect, it can't hook up, and then it  falls back.

Would you mind posting your submission script and the R code? Or post a link?

My R-MPI collection is growing. here are instructions on how you can
see.  http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Cluster:Main
On Thu, Aug 11, 2011 at 6:40 AM, Jim Maas <j.maas at uea.ac.uk> wrote: