Hi All, I'm relatively new to this but it has worked well for me previously and now I'm getting errors. I'm attempting to run and R job, on a cluster using LSF operating system, using the packages doMPI and foreach. In the job file I've requested 128 slots, and MPI gets that number quite successfully but is giving this error message ===================== Loading required package: Rmpi Loading required package: Rmpi -------------------------------------------------------------------------- [[51610,1],5]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: cn004.private.dns.zone Another transport will be used instead, although this may result in lower performance. ===================== As a test I've run the same job with a smaller number of slots, and it runs fine when I run it on 32 or 64 slots, but when I increase to 128 slots, I get this error? I guess I'm asking if the R packages doMPI and foreach are scalable, and to what level? Thanks for any suggestions. Jim Dr. Jim Maas University of East Anglia
Error message from MPI
3 messages · Jim Maas, Brian G. Peterson, Paul Johnson
On Thu, 2011-08-11 at 12:40 +0100, Jim Maas wrote:
I guess I'm asking if the R packages doMPI and foreach are scalable, and to what level?
I suspect some limitation in the configuration on your cluster. (for example, what is OpenFabrics and what does it have to do with MPI?) You may need to speak with your cluster administrators. I've successfully used foreach and doMPI on over 250 worker nodes. Unfortunately, I'm not sure how much additional help I can be beyond my one data point, as I no longer have access to that particular cluster. Regards, - Brian
Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock
1 day later
Hello, Jim I have seen similar, but the jobs still run. On my system, it happens because our cluster has a mixture of fabrics. Some are regular old ethernet, some are infiniband. (suspect you are similar from openib message). When I submit jobs on the cluster, and they are sent to nodes that are ethernet connected, the infiniband connector tries to connect, it can't hook up, and then it falls back. Would you mind posting your submission script and the R code? Or post a link? My R-MPI collection is growing. here are instructions on how you can see. http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Cluster:Main
On Thu, Aug 11, 2011 at 6:40 AM, Jim Maas <j.maas at uea.ac.uk> wrote:
Hi All, I'm relatively new to this but it has worked well for me previously and now I'm getting errors. I'm attempting to run and R job, on a cluster using LSF operating system, using the packages doMPI and foreach. ?In the job file I've requested 128 slots, and MPI gets that number quite successfully but is giving this error message ===================== Loading required package: Rmpi Loading required package: Rmpi -------------------------------------------------------------------------- [[51610,1],5]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) ?Host: cn004.private.dns.zone Another transport will be used instead, although this may result in lower performance. ===================== As a test I've run the same job with a smaller number of slots, and it runs fine when I run it on 32 or 64 slots, but when I increase to 128 slots, I get this error? I guess I'm asking if the R packages doMPI and foreach are scalable, and to what level?
Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas