Hi all,
I encountered an interesting edge case involving Rhtslib linkage today,
when I tried to use csaw on my institute's cluster. Package installation
proceeded without a hitch, but running "library(csaw)" failed to load
"csaw.so" due to a failure in finding "libhts.so.0".
My problem stems from the fact that the home drive is mounted in
separate locations between the headnode (where I install stuff) and the
cluster nodes (where the submitted jobs actually get run):
~ on headnode: /cstHome/home/jmlab/
~ on cluster nodes: /home/jmlab/
This normally doesn't cause any issues because the headnode provides a
softlink "/home", which points to "/cstHome/home". For end users or
programs, this makes it seem as if the locations are the same. Indeed,
my R installation thinks R_HOME is "/home/jmlab/software/R/devel", which
is a valid path on both the headnode and cluster nodes. I usually have
no problems running the same R code on either node.
However, Rhtslib::pkgconfig() calls system.file(), which calls
.libPaths(), which performs path normalization to obtain the full file
path without softlinks. This means that, upon installation on the
headnode, "csaw.so" is linked to
"/cstHome/home/jmlab/...etc.../library/Rhtlib/lib/libhts.so". This is
fine when running jobs on the headnode, but fails with "file/directory
not found" for "libhts.so.0" on the cluster node because the
"/cstHome/..." path does not exist there.
So, the crux of the problem is that system.file() does not respect the
soft links that are masking the "true" mount locations of the drives. In
contrast, running R.home() gives me the expected
"/home/jmlab/software/R/devel", with the softlink intact. Changing zzz.R
to use:
system.file(..., lib.loc=R.home("library"))
... in the definition of Rhtslib::pkgconfig() fixes the problem.
Is there a better solution than what I've done? I don't know whether
preservation of softlinks in full file paths is a desirable thing to do
in general, though I would have thought that if it's good enough for
R.home(), it's probably also good enough for system.file().
Cheers,
Aaron
P.S. I should point out that the other obvious solution is to install
csaw from the cluster nodes, such that the path that gets used by
Rhtslib::pkgconfig() is "/home/jmlab/...". However, at least on this
system, the cluster nodes don't have write access to "/home/jmlab/...".
Storing the R installation on the lustre file system (which can be
written) would result in intolerably slow loading times.
[Bioc-devel] interesting edge case with Rhtslib linkage
2 messages · Aaron Lun, Martin Morgan
On 04/18/2017 10:48 AM, Aaron Lun wrote:
Hi all,
I encountered an interesting edge case involving Rhtslib linkage today,
when I tried to use csaw on my institute's cluster. Package installation
proceeded without a hitch, but running "library(csaw)" failed to load
"csaw.so" due to a failure in finding "libhts.so.0".
My problem stems from the fact that the home drive is mounted in
separate locations between the headnode (where I install stuff) and the
cluster nodes (where the submitted jobs actually get run):
~ on headnode: /cstHome/home/jmlab/
~ on cluster nodes: /home/jmlab/
This normally doesn't cause any issues because the headnode provides a
softlink "/home", which points to "/cstHome/home". For end users or
programs, this makes it seem as if the locations are the same. Indeed,
my R installation thinks R_HOME is "/home/jmlab/software/R/devel", which
is a valid path on both the headnode and cluster nodes. I usually have
no problems running the same R code on either node.
However, Rhtslib::pkgconfig() calls system.file(), which calls
.libPaths(), which performs path normalization to obtain the full file
path without softlinks. This means that, upon installation on the
headnode, "csaw.so" is linked to
"/cstHome/home/jmlab/...etc.../library/Rhtlib/lib/libhts.so". This is
fine when running jobs on the headnode, but fails with "file/directory
not found" for "libhts.so.0" on the cluster node because the
"/cstHome/..." path does not exist there.
So, the crux of the problem is that system.file() does not respect the
soft links that are masking the "true" mount locations of the drives. In
contrast, running R.home() gives me the expected
"/home/jmlab/software/R/devel", with the softlink intact. Changing zzz.R
to use:
system.file(..., lib.loc=R.home("library"))
... in the definition of Rhtslib::pkgconfig() fixes the problem.
Is there a better solution than what I've done? I don't know whether
preservation of softlinks in full file paths is a desirable thing to do
in general, though I would have thought that if it's good enough for
R.home(), it's probably also good enough for system.file().
Two different ideas are (a) to provide an environment variable
RHTSLIB_RPATH that can override system.file() (I think it's actually
.libPaths() that is using noramlizePath() and expanding symlinks)
pkgconfig <-
function(opt = c("PKG_LIBS", "PKG_CPPFLAGS"))
{
path <- Sys.getenv(
"RHTSLIB_RPATH",
system.file("lib", package="Rhtslib", mustWork=TRUE)
)
if (nzchar(.Platform$r_arch)) {
...
and (b) to use static rather than dynamic linking, as we do on macOS
...
result <- switch(match.arg(opt), PKG_CPPFLAGS={
sprintf('-I"%s"', system.file("include", package="Rhtslib"))
}, PKG_LIBS={
switch(Sys.info()['sysname'], Linux={
sprintf('-%s/libhts.a -lz -pthread', patharch)
}, Darwin={
...
On balance I think it would be as easy to use static linking, but I'm
open to other ideas.
Martin
Cheers, Aaron P.S. I should point out that the other obvious solution is to install csaw from the cluster nodes, such that the path that gets used by Rhtslib::pkgconfig() is "/home/jmlab/...". However, at least on this system, the cluster nodes don't have write access to "/home/jmlab/...". Storing the R installation on the lustre file system (which can be written) would result in intolerably slow loading times.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}