Hello R-devel,
Currently on Unix-like systems, Sys.which incorporates the absolute
path to the `which` executable, obtained at the configure stage:
## hopefully configure found [/usr]/bin/which
which <- "@WHICH@"
if (!nzchar(which)) {
warning("'which' was not found on this platform")
This poses a problem for the Spack package manager and software
distribution. In Spack, like in Nix, Guix, and GoboLinux, packages live
under their own path prefixes, which look like the following:
Unfortunately, Spack packages are expected to get relocated, changing
the path prefix and invalidating stored paths, including the path to
`which`: <https://github.com/spack/spack/issues/41953>.
Harmen Stoppels, who is not subscribed to R-devel but interested in
making R work in Spack, currently creates a symlink to `which`
<https://github.com/r-devel/r-svn/pull/151> as part of a patch to R.
What would be the minimally disruptive way to avoid this dependency or
at least make it easier to fix post-factum, during relocation? What
would be the pitfall if Sys.which() were to find `which` on the $PATH
by itself, without remembering the full path?
Ivan,
I suspect that the `which' case is just the tip of the iceberg - generally, R expects all tools it detects at configure time to be callable, just to list a few from a running session:
PAGER /usr/bin/less
R_BROWSER /usr/bin/open
R_BZIPCMD /usr/bin/bzip2
R_GZIPCMD /usr/bin/gzip
R_PDFVIEWER /usr/bin/open
R_QPDF /Library/Frameworks/R.framework/Resources/bin/qpdf
R_TEXI2DVICMD /usr/local/bin/texi2dvi
R_UNZIPCMD /usr/bin/unzip
R_ZIPCMD /usr/bin/zip
SED /usr/bin/sed
TAR /usr/bin/tar
I would claim it is not an unreasonable expectation that the user doesn't delete tools after R was built. Obviously it is tedious, but Spack may need to patch all absolute paths if it wants to relocate things (it's not easy as it includes all linker paths as well FWIW) - they must be doing something like that already as even the R start-up script uses absolute paths:
$ grep ^R_ /Library/Frameworks/R.framework/Resources/bin/R
R_HOME_DIR=/Library/Frameworks/R.framework/Resources
R_HOME="${R_HOME_DIR}"
R_SHARE_DIR=/Library/Frameworks/R.framework/Resources/share
R_INCLUDE_DIR=/Library/Frameworks/R.framework/Resources/include
R_DOC_DIR=/Library/Frameworks/R.framework/Resources/doc
R_binary="${R_HOME}/bin/exec${R_ARCH}/R"
That said, WHICH is a mess - it may make sense to switch to the command -v built-in which is part of POSIX (where available - which is almost everywhere today) which would not require an external tool, but as noted I don't think this is the only problem Spack has... (and that's just core R - even a bigger can of worms with R packages :P).
Cheers,
Simon
On Jan 11, 2024, at 8:43 AM, Ivan Krylov via R-devel <r-devel at r-project.org> wrote:
Hello R-devel,
Currently on Unix-like systems, Sys.which incorporates the absolute
path to the `which` executable, obtained at the configure stage:
## hopefully configure found [/usr]/bin/which
which <- "@WHICH@"
if (!nzchar(which)) {
warning("'which' was not found on this platform")
This poses a problem for the Spack package manager and software
distribution. In Spack, like in Nix, Guix, and GoboLinux, packages live
under their own path prefixes, which look like the following:
Unfortunately, Spack packages are expected to get relocated, changing
the path prefix and invalidating stored paths, including the path to
`which`: <https://github.com/spack/spack/issues/41953>.
Harmen Stoppels, who is not subscribed to R-devel but interested in
making R work in Spack, currently creates a symlink to `which`
<https://github.com/r-devel/r-svn/pull/151> as part of a patch to R.
What would be the minimally disruptive way to avoid this dependency or
at least make it easier to fix post-factum, during relocation? What
would be the pitfall if Sys.which() were to find `which` on the $PATH
by itself, without remembering the full path?
--
Best regards,
Ivan
For context: I don't think Nix and Guix have to relocate anything, cause I think they require absolute paths like /nix/store where all binaries go. Spack on the other hand can install packages w/o sudo to a location of choice, e.g. ~/spack/opt/spack. That's why we have to patch binaries.
However, Spack's relocation stuff is not about creating truly relocatable binaries where everything is referenced by relative paths. We still use absolute paths almost everywhere, it's just that they have to be rewired when the location things are built is different from where they are installed.
I'm sure there are people who would like to have an actually fully relocatable R, but that's not my goal.
I would claim it is not an unreasonable expectation that the user doesn't delete tools after R was built. Obviously it is tedious, but Spack may need to patch all absolute paths if it wants to relocate things (it's not easy as it includes all linker paths as well FWIW) - they must be doing something like that already as even the R start-up script uses absolute paths
Basically Spack does (a) special handling of the dynamic section of ELF files for Linux / FreeBSD, and load commands of mach-o files for macOS, (b) find & replace of prefixes in text files / scripts and (c) somewhat fancy but still basic replacement of C-strings containing prefixes in binaries.
This works reliably because the package prefixes contain hashes that make false positives unlikely.
It's just that it does not work when the absolute path to be relocated is captured inside serialized bytecode in a zlib-compressed database base.rdb :)
I believe `which` is the only occurrence of this.
That said, WHICH is a mess - it may make sense to switch to the command -v built-in which is part of POSIX (where available - which is almost everywhere today) which would not require an external tool
That sounds like a decent solution to me, probably `command -v` is more commonly available than `which`.
I don't think this is the only problem Spack has... (and that's just core R - even a bigger can of worms with R packages :P).
We can deal with most issues, just not with compressed byte code.
Harmen,
thanks for the additional details, it wasn't exactly clear what this is about. Ivan's post didn't mention that the issue here is the caching, not the path replacement which you are apparently already doing, now it makes more sense.
I still think it is dangerous as you have no way of knowing who else is caching values at installation time since there is no reason to assume that the system will change after installation - technically, it breaks the contract with the application. We are trying to get packages to not hard-code or cache paths, but that typically only applies to the package library location, not to system tools.
Cheers,
Simon
On Jan 11, 2024, at 10:36 AM, Harmen Stoppels <me at harmenstoppels.nl> wrote:
For context: I don't think Nix and Guix have to relocate anything, cause I think they require absolute paths like /nix/store where all binaries go. Spack on the other hand can install packages w/o sudo to a location of choice, e.g. ~/spack/opt/spack. That's why we have to patch binaries.
However, Spack's relocation stuff is not about creating truly relocatable binaries where everything is referenced by relative paths. We still use absolute paths almost everywhere, it's just that they have to be rewired when the location things are built is different from where they are installed.
I'm sure there are people who would like to have an actually fully relocatable R, but that's not my goal.
I would claim it is not an unreasonable expectation that the user doesn't delete tools after R was built. Obviously it is tedious, but Spack may need to patch all absolute paths if it wants to relocate things (it's not easy as it includes all linker paths as well FWIW) - they must be doing something like that already as even the R start-up script uses absolute paths
Basically Spack does (a) special handling of the dynamic section of ELF files for Linux / FreeBSD, and load commands of mach-o files for macOS, (b) find & replace of prefixes in text files / scripts and (c) somewhat fancy but still basic replacement of C-strings containing prefixes in binaries.
This works reliably because the package prefixes contain hashes that make false positives unlikely.
It's just that it does not work when the absolute path to be relocated is captured inside serialized bytecode in a zlib-compressed database base.rdb :)
I believe `which` is the only occurrence of this.
That said, WHICH is a mess - it may make sense to switch to the command -v built-in which is part of POSIX (where available - which is almost everywhere today) which would not require an external tool
That sounds like a decent solution to me, probably `command -v` is more commonly available than `which`.
I don't think this is the only problem Spack has... (and that's just core R - even a bigger can of worms with R packages :P).
We can deal with most issues, just not with compressed byte code.
Simon Urbanek <simon.urbanek at R-project.org> wrote:
That said, WHICH is a mess - it may make sense to switch to the
command -v built-in which is part of POSIX (where available - which
is almost everywhere today) which would not require an external tool
This is a bit tricky to implement. I've prepared the patch at the end
of this e-mail, tested it on GNU/Linux and tried to test on OpenBSD [*]
(I cannot test on a Mac), but then I realised one crucial detail:
unlike `which`, `command -v` returns names of shell builtins if
something is both an executable and a builtin. So for things like `[`,
Sys.which would behave differently if changed to use command -v:
$ sh -c 'which ['
/usr/bin/[
$ sh -c 'command -v ['
[
R checks the returned string with file.exists(), so the new
Sys.which('[') returns an empty string instead of /usr/bin/[. That's
probably undesirable, isn't it?
Index: configure
===================================================================
--- configure (revision 85802)
+++ configure (working copy)
@@ -949,7 +949,6 @@
PDFTEX
TEX
PAGER
-WHICH
SED
INSTALL_DATA
INSTALL_SCRIPT
@@ -5390,66 +5389,6 @@
done
test -n "$SED" || SED="/bin/sed"
-
-## 'which' is not POSIX, and might be a shell builtin or alias
-## (but should not be in 'sh')
-for ac_prog in which
-do
- # Extract the first word of "$ac_prog", so it can be a program name with args.
-set dummy $ac_prog; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_path_WHICH+y}
-then :
- printf %s "(cached) " >&6
-else $as_nop
- case $WHICH in
- [\\/]* | ?:[\\/]*)
- ac_cv_path_WHICH="$WHICH" # Let the user override the test with a path.
- ;;
- *)
- as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
- IFS=$as_save_IFS
- case $as_dir in #(((
- '') as_dir=./ ;;
- */) ;;
- *) as_dir=$as_dir/ ;;
- esac
- for ac_exec_ext in '' $ac_executable_extensions; do
- if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
- ac_cv_path_WHICH="$as_dir$ac_word$ac_exec_ext"
- printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
- break 2
- fi
-done
- done
-IFS=$as_save_IFS
-
- ;;
-esac
-fi
-WHICH=$ac_cv_path_WHICH
-if test -n "$WHICH"; then
- { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $WHICH" >&5
-printf "%s\n" "$WHICH" >&6; }
-else
- { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
- test -n "$WHICH" && break
-done
-test -n "$WHICH" || WHICH="which"
-
-if test "${WHICH}" = which ; then
- ## needed to build and run R
- ## ends up hard-coded in the utils package
- as_fn_error $? "which is required but missing" "$LINENO" 5
-fi
-
## Make
: ${MAKE=make}
Index: configure.ac
===================================================================
--- configure.ac (revision 85802)
+++ configure.ac (working copy)
@@ -680,15 +680,6 @@
## we would like a POSIX sed, and need one on Solaris
AC_PATH_PROGS(SED, sed, /bin/sed, [/usr/xpg4/bin:$PATH])
-## 'which' is not POSIX, and might be a shell builtin or alias
-## (but should not be in 'sh')
-AC_PATH_PROGS(WHICH, which, which)
-if test "${WHICH}" = which ; then
- ## needed to build and run R
- ## ends up hard-coded in the utils package
- AC_MSG_ERROR([[which is required but missing]])
-fi
-
## Make
: ${MAKE=make}
AC_SUBST(MAKE)
Index: src/library/base/Makefile.in
===================================================================
--- src/library/base/Makefile.in (revision 85802)
+++ src/library/base/Makefile.in (working copy)
@@ -28,7 +28,7 @@
all: Makefile DESCRIPTION
@$(ECHO) "building package '$(pkg)'"
@$(MKINSTALLDIRS) $(top_builddir)/library/$(pkg)
- @WHICH="@WHICH@" $(MAKE) mkRbase mkdesc2 mkdemos2
+ @$(MAKE) mkRbase mkdesc2 mkdemos2
@$(INSTALL_DATA) $(srcdir)/inst/CITATION $(top_builddir)/library/$(pkg)
include $(top_srcdir)/share/make/basepkg.mk
@@ -45,12 +45,12 @@
mkR: mkRbase
Rsimple:
- @WHICH="@WHICH@" $(MAKE) mkRbase mkRsimple
+ @$(MAKE) mkRbase mkRsimple
## Remove files to allow this to be done repeatedly
Rlazy:
- at rm -f $(top_builddir)/library/$(pkg)/R/$(pkg)*
- @WHICH="@WHICH@" $(MAKE) mkRbase
+ @$(MAKE) mkRbase
@cat $(srcdir)/makebasedb.R | \
R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) > /dev/null
@$(INSTALL_DATA) $(srcdir)/baseloader.R \
@@ -57,4 +57,4 @@
$(top_builddir)/library/$(pkg)/R/$(pkg)
Rlazycomp:
- @WHICH="@WHICH@" $(MAKE) mkRbase mklazycomp
+ @$(MAKE) mkRbase mklazycomp
Index: src/library/base/R/unix/system.unix.R
===================================================================
--- src/library/base/R/unix/system.unix.R (revision 85802)
+++ src/library/base/R/unix/system.unix.R (working copy)
@@ -114,23 +114,14 @@
Sys.which <- function(names)
{
res <- character(length(names)); names(res) <- names
- ## hopefully configure found [/usr]/bin/which
- which <- "@WHICH@"
- if (!nzchar(which)) {
- warning("'which' was not found on this platform")
- return(res)
- }
for(i in seq_along(names)) {
if(is.na(names[i])) {res[i] <- NA; next}
## Quoting was added in 3.0.0
- ans <- suppressWarnings(system(paste(which, shQuote(names[i])),
- intern = TRUE, ignore.stderr = TRUE))
- ## Solaris' which gives 'no foo in ...' message on stdout,
- ## GNU which does it on stderr
- if(grepl("solaris", R.version$os)) {
- tmp <- strsplit(ans[1], " ", fixed = TRUE)[[1]]
- if(identical(tmp[1:3], c("no", i, "in"))) ans <- ""
- }
+ ans <- tryCatch(
+ suppressWarnings(system(paste("command -v", shQuote(names[i])),
+ intern = TRUE, ignore.stderr = TRUE)),
+ error = \(e) ""
+ )
res[i] <- if(length(ans)) ans[1] else ""
## final check that this is a real path and not an error message
if(!file.exists(res[i])) res[i] <- ""
Best regards,
Ivan
[*] example(Sys.which()) works, but there are multiple problems
elsewhere, e.g. tan(1+1000i) returns NaN+1i and Matrix doesn't install
without gmake
On Friday, January 12th, 2024 at 16:11, Ivan Krylov <ikrylov at disroot.org> wrote:
unlike `which`, `command -v` returns names of shell builtins if
something is both an executable and a builtin. So for things like `[`,
Sys.which would behave differently if changed to use command -v
Then can we revisit my simple fix, which refers to `which` through a
symlink instead of a hard-coded absolute in an R-source file:
From 3f2b1b6c94460fd4d3e9f03c9f17a25db2d2b473 Mon Sep 17 00:00:00 2001
From: Harmen Stoppels <me at harmenstoppels.nl>
Date: Wed, 10 Jan 2024 12:40:40 +0100
Subject: [PATCH] base: use a symlink for which instead of hard-coded string
---
share/make/basepkg.mk | 8 ++++----
src/library/base/R/unix/system.unix.R | 6 +++---
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/share/make/basepkg.mk b/share/make/basepkg.mk
index c0a69c8a0af..4cf63878709 100644
--- a/share/make/basepkg.mk
+++ b/share/make/basepkg.mk
@@ -72,16 +72,16 @@ mkRbase:
else \
cat $(RSRC) > "$${f}"; \
fi; \
- f2=$${TMPDIR:-/tmp}/R2$$$$; \
- sed -e "s:@WHICH@:${WHICH}:" "$${f}" > "$${f2}"; \
- rm -f "$${f}"; \
- $(SHELL) $(top_srcdir)/tools/move-if-change "$${f2}" all.R)
+ $(SHELL) $(top_srcdir)/tools/move-if-change "$${f}" all.R)
@if ! test -f $(top_builddir)/library/$(pkg)/R/$(pkg); then \
$(INSTALL_DATA) all.R $(top_builddir)/library/$(pkg)/R/$(pkg); \
else if test all.R -nt $(top_builddir)/library/$(pkg)/R/$(pkg); then \
$(INSTALL_DATA) all.R $(top_builddir)/library/$(pkg)/R/$(pkg); \
fi \
fi
+ @if ! test -f $(top_builddir)/library/$(pkg)/R/which; then \
+ cd $(top_builddir)/library/$(pkg)/R/ && $(LN_S) $(WHICH) which; \
+ fi
mkdesc:
@if test -f DESCRIPTION; then \
diff --git a/src/library/base/R/unix/system.unix.R b/src/library/base/R/unix/system.unix.R
index 3bb7d0cb27c..78271c8c12c 100644
--- a/src/library/base/R/unix/system.unix.R
+++ b/src/library/base/R/unix/system.unix.R
@@ -114,9 +114,9 @@ system2 <- function(command, args = character(),
Sys.which <- function(names)
{
res <- character(length(names)); names(res) <- names
- ## hopefully configure found [/usr]/bin/which
- which <- "@WHICH@"
- if (!nzchar(which)) {
+ which <- file.path(R.home(), "library", "base", "R", "which")
+ ## which should be a symlink to the system's which
+ if (!file.exists(which)) {
warning("'which' was not found on this platform")
return(res)
}