Skip to content

Rmpi on CentOS (64bit)

10 messages · Brian Ripley, Dirk Eddelbuettel, Marc Schwartz +1 more

#
I got Rmpi to compile with little difficulty, but had a tricky time
setting the LD_LIBRARY_PATH to use the OpenMPI libs.  I now get a
different error when I try to load Rmpi
Loading required package: Rmpi
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
--------------------------------------------------------------------------
WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------

I don't have this problem with Fedora 11 and I'd have thought there
would be little difference with CentOS (apart from the latter being 64
bit).

Is there something else that needs to be specified?


TIA
#
There are many different versions of OpenMPI about.  It looks like you 
have one that is set up for specialized hardware.  Either this is the 
wrong version or a configuration error, and you will need to talk to 
your 'local System Administrator'.

Incidentally, you should not have to set LD_LIBRARY_PATH, but I 
frequently have had to add configuration files in /etc/ld.so.conf.d, 
including for openmpi on Fedora 12. On Fedora 10 (but not 12) MPI was 
under the /etc/alternatives mechanism, and had other problems.  I 
currently have

gannet% cat /etc/ld.so.conf.d/openmpi-x86_64.conf
/usr/lib64/openmpi/lib

on F12.
On Wed, 3 Mar 2010, Patrick Connolly wrote:

            

  
    
#
On Wed, 03-Mar-2010 at 08:42AM +0000, Prof Brian Ripley wrote:

            
He knows less about it (MPI, at least) than I do.  Perhaps this is
'specialized hardware' in that it's a dual quad-core processor machine
-- but I'd have thought that's not particularly special nowadays.

I notice that my Fedora installation has no dat.conf file.  Perhaps
it pertains to something special on the CentOS machine.  I can't
check the CentOS machine right now, but at one time, I did find the
rpm that is associated with the dat.conf file.  Fedora seems not to
need that one.
It might not be elegant, but at least it got over that problem.  Is
there any possibility that doing it so inelegantly has a bearing on
the issues I have now?
Fedora 11 would appear to be like F12.  I'll check later if CentOS is
more like the way F10 was.
Is that to say I could make a similar file to avoid setting
LD_LIBRARY_PATH?

Thanks for the help.

  
    
#
On Thu, 4 Mar 2010, Patrick Connolly wrote:

            
No, it is high-speed interconnects, used in high-performance clusters.
It does if you have that sort of hardware (and we do on one of the 
clusters we use).
Unlikely, unless you got the wrong libmpi.
Indeed, that is the recommended mechanism.

  
    
#
On Mar 3, 2010, at 1:24 PM, Prof Brian Ripley wrote:

            
<snip>

Patrick, just as an FYI, I did not see which variant of CentOS you are using, but:

CentOS 4, which is based upon RHEL 4, is in turn based upon Fedora Core 3 (2004).

CentOS 5, which is based upon RHEL 5, is in turn based upon Fedora Core 6 (2006).

So to reinforce, there is a substantial and intentional lag between RHEL/CentOS and Fedora. Recall that RHEL and CentOS are targeted for stable server use, whereas Fedora is a bleeding edge distro.

HTH,

Marc Schwartz
#
On 3 March 2010 at 19:24, Prof Brian Ripley wrote:
| On Thu, 4 Mar 2010, Patrick Connolly wrote:
|
| > On Wed, 03-Mar-2010 at 08:42AM +0000, Prof Brian Ripley wrote:
| >
| >> There are many different versions of OpenMPI about.  It looks like
| >> you have one that is set up for specialized hardware.  Either this
| >> is the wrong version or a configuration error, and you will need to
| >> talk to your 'local System Administrator'.
| >
| > He knows less about it (MPI, at least) than I do.  Perhaps this is
| > 'specialized hardware' in that it's a dual quad-core processor machine
| > -- but I'd have thought that's not particularly special nowadays.
| 
| No, it is high-speed interconnects, used in high-performance clusters.

A similar issue once arose on Debian as we built Open MPI with the IB libs
even though most people don't have suitable Inifiniband hardware. In our case
that lead to a noisy warning; upstream later suppressed the warning given
certain conditions.  

You could try to suppress the probe for IB which we did (in the older 1.2.*
series of OpenMPI) via

   # Disable the use of InfiniBand
   #   btl = ^openib
   btl = ^openib

in  /etc/openmpi/openmpi-mca-params.conf

Hth, Dirk
#
On Wed, 03-Mar-2010 at 01:46PM -0600, Marc Schwartz wrote:
|> Patrick, just as an FYI, I did not see which variant of CentOS you
|> are using, but:

Apologies.  I didn't mention it's 5.4

|> CentOS 4, which is based upon RHEL 4, is in turn based upon Fedora
|> Core 3 (2004).

|> CentOS 5, which is based upon RHEL 5, is in turn based upon Fedora
|> Core 6 (2006).

|> So to reinforce, there is a substantial and intentional lag between
|> RHEL/CentOS and Fedora. Recall that RHEL and CentOS are targeted
|> for stable server use, whereas Fedora is a bleeding edge distro.

Yes.  For that reason, I wished to get Rmpi working on CentOS.  I use
Fedora 11 at home and I'm a bit put off by the 300-500 Mb of updates
most weeks.  It's nice using the new stuff, but those updates
periodically screw up what had been working well.  I wouldn't want
that on a production machine.  Looks as though I'll have to do so
anyway.  My Linux skills aren't up to sorting out this CentOS lot, and
I should at least get it started: it's likely there's not much
difference between F11 and F12 for this task.

This could well be a case where Debian would be the easiest way to go,
but I couldn't convince the IT people to go down such a new track.
Ours is very much an rpm site.  In any case, my only Debian-type
experience is with Mepis (where I got Rmpi working in 20 minutes, but
I don't think that makes me a Debian pro).

Thanks for the hints.
#
On Mar 4, 2010, at 1:24 AM, Patrick Connolly wrote:

            
No problem.
BTW, just as happenstance this morning, I found the following blog posting, that may or may not be of help to you for Rmpi on Fedora:

  http://www.r-bloggers.com/r-tips-installing-rmpi-on-fedora-linux/

Notwithstanding the IT people issues, you could perhaps consider Ubuntu LTS, which provides a hybrid-ish approach of having a fairly up to date desktop Linux distro with longer term post-release support. Moving to a Debian based distro of course also avails you of the significant work that folks like Dirk have put in place to make most CRAN packages easily available via apt.

There was a similar hybrid attempt for Fedora a few years ago, called Fedora Legacy, but it was effectively DOA. There were folks who wanted and argued for longer post release support, to avoid the frequent release update cycle. Not surprisingly, with the exception of a core group, the majority of those who wanted it were not willing to provide the substantial voluntary resources to actually make it successful. Not to mention, it was anathema to Fedora's raison d'etre and there were heated discussions on the Fedora lists.

F13 is scheduled for release mid-May, which means that F11 goes EOL mid-June. So you will need to think about moving any F11 based systems to F12 or F13 in the not too distant future. That's the challenge of Fedora's life cycle, with twice per year major releases, so one has to make an informed decision as to the willingness to be on a fast track. It is one of the reasons that I moved to OSX a year ago, after 8 years on RH/Fedora, along with growing frustration over the poor nature of Linux hardware support from the GPU vendors (especially nVidia) at the time. As they say, the only good thing about banging your head against the wall, is that it feels so good when you stop.

To contrast, RHEL has a 7 year life cycle, which of course carries over to CentOS, again reflecting server versus desktop requirements. You can see more information here:

  http://www.redhat.com/security/updates/errata/

So each RHEL/CentOS major release lives during roughly 14 Fedora major releases.

There are also various rumors and speculations about RHEL 6, its release date and which version of Fedora it will be based upon, with some suggesting F12 or F13:

  http://lwn.net/Articles/364405/

  http://jason.roysdon.net/2010/01/29/red-hat-enterprise-linux-6-speculation/


HTH,

Marc
#
On Wed, 03-Mar-2010 at 01:57PM -0600, Dirk Eddelbuettel wrote:
[...]

|> You could try to suppress the probe for IB which we did (in the older 1.2.*
|> series of OpenMPI) via
|> 
|>    # Disable the use of InfiniBand
|>    #   btl = ^openib
|>    btl = ^openib
|> 
|> in  /etc/openmpi/openmpi-mca-params.conf

CentOS has it in a completely different place.  I tried that
suggestion, but to no avail.


I looked into "... the availability of the interfaces in the dat.conf
file" which the error message mentioned. 


$ locate dat.conf
/etc/ofed/dat.conf
/etc/ofed/compat-dapl/dat.conf
/usr/share/man/man5/dat.conf.5.gz

(No such files appear in Fedora 11 and there seems to be no
ill-effects).

Are we to assume that it's the second of those that is related to the message?


$ cat /etc/ofed/compat-dapl/dat.conf
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""
OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 1" ""
OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ipath0 2" ""
OpenIB-ehca0-2 u1.2 nonthreadsafe default libdaplscm.so.2 dapl.1.2 "ehca0 1" ""
OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""


$ cat /etc/ofed/dat.conf
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1"  ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
$


Does that give anyone any clues as to what could be going on the
message (which went like this)?

librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm
--------------------------------------------------------------------------
WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------



Could there be a modification to Dirk's suggestion that might deal
with it?  (I'm making a last-ditch attempt to avoid using Fedora.)

It's hard to find anything much about CentOS and MPI -- or at least
what people did to get it working.  I found tales of people having
difficulties with Fedora that I didn't have.  I'm not much wiser than
I began.


best
#
On Wed, 03-Mar-2010 at 07:53PM +1300, Patrick Connolly wrote:
|> I got Rmpi to compile with little difficulty, but had a tricky time
|> setting the LD_LIBRARY_PATH to use the OpenMPI libs.  I now get a
|> different error when I try to load Rmpi
|> 
|> 
|> > require(Rmpi)                  
|> Loading required package: Rmpi
|> librdmacm: couldn't read ABI version.
|> librdmacm: assuming: 4
|> libibverbs: Fatal: couldn't read uverbs ABI version.
|> CMA: unable to open /dev/infiniband/rdma_cm
|> --------------------------------------------------------------------------
|> WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:].
|> This may be a real error or it may be an invalid entry in the uDAPL
|> Registry which is contained in the dat.conf file. Contact your local
|> System Administrator to confirm the availability of the interfaces in
|> the dat.conf file.
|> --------------------------------------------------------------------------


I looked further into this uDAPL thing and found an FAQ:
http://www.open-mpi.org/faq/?category=udapl
This one was particularly interesting:

3. Where is the static uDAPL Registry found?

Solaris: /etc/dat/dat.conf

Linux: /etc/dat.conf 


On my CentOS installation, I have two, neither of which is where it's
supposed to be for "Linux" in the opinion of open-mpi.org.

$ locate dat.conf
/etc/ofed/dat.conf
/etc/ofed/compat-dapl/dat.conf
/usr/share/man/man5/dat.conf.5.gz


Would there be a way at the time of compiling Rmpi to specify a
configure arg to use one or other of the ones I have?  Or would it be
simpler to make a link in /etc/ to one or other of them?  Is there any
danger that such a link could bother anything else?

best