It would appear there is a fix for the issue I was seeing. After some discussion around the underlying cause of the
problem on the OpenMPI devel list, Jeff Squyres wrote:
Now, all this being said, IIRC (and I very well may not!), the real underlying issue here is that R is dlopening libmpi.so, which, in turn, is dlopening its own DSOs. Given the global linker scoping issues, OMPI's DSOs are unable to find the symbols they need to resolve in the process (because libmpi.so's was opened in a private scope). This probably is unfortunately larger than us (Open MPI) -- it's really a POSIX issue. What would be ideal is if different linker namespaces could be something more fine-grained than "global" or "private" within a process. E.g., if the private namespace of libmpi.so in the process could selectively make its symbol namespace available to the DSOs that it dlopens. Right now, the only option libmpi.so has is to be opened with a public scope, which somewhat defeats the point of private scoping.
Tying in with the suggestions Jeff makes above, there would seem to
be a work-around fix for this, in the case of the Rmpi package
on NetBSD anyway.
Furthermore, the fix does not require any alterations to OpenMPI.
Apparently, there has been a similar issue, symbol visibility
when chaining shared library loading, within PAM on NetBSD.
Mark Davies has now determined a way to force the Rmpi package
to load libmpi.so, ahead of loading the Rmpi shared library itself,
so that what appear to be the missing symbols are then available,
for any future loads of the OpenMPI component libraries.
On the version of Rmpi that I have been using, 0.5-8, the "fix"
can be effected by the following, one, line, patch
--- Rmpi/R/zzz.R 2009-02-04 05:27:08.000000000 +1300
+++ Rmpi.local/R/zzz.R 2010-05-17 14:25:27.000000000 +1200
@@ -7,6 +7,7 @@
# cat(vertxt)
# Check if lam-mpi is running
+ dyn.load("/usr/pkg/lib/libmpi.so", local=FALSE)
library.dynam("Rmpi", pkg, lib)
if (!TRUE)
stop("Fail to load Rmpi dynamic library.")
Note that this currently hard codes the path to the libmpi.so,
which for our system is in the standard NetBSD PkgSrc location,
though there are probably "nicer" ways to achieve the same end,
and greater flexibility, using R internals.
Having said that, this "fix" does not seem to be needed on
plaforms that have a global scope for shared library symbols,
so maybe attempts to make it generic may be pointless.
Kevin
Kevin M. Buckley Room: CO327 School of Engineering and Phone: +64 4 463 5971 Computer Science Victoria University of Wellington New Zealand