Skip to content

setting .libPaths() with parallel::clusterCall

3 messages · iuke-tier@ey m@iii@g oii uiow@@edu, Mark van der Loo

#
Dear all,

It is not possible to set library paths on worker nodes with
parallel::clusterCall (or snow::clusterCall) and I wonder if this is
intended behavior.

Example.

library(parallel)
libdir <- "./tmplib"
if (!dir.exists(libdir)) dir.create("./tmplib")

cl <- makeCluster(2)
clusterCall(cl, .libPaths, c(libdir, .libPaths()) )

The output is as expected with the extra libdir returned for each worker
node. However, running

clusterEvalQ(cl, .libPaths())

Shows that the library paths have not been set.

If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R
4.0.3 and r-devel.

Best,
Mark
ps: a workaround is documented here:
https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/
R Under development (unstable) (2020-12-21 r79668)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/mark/projects/Rdev/R-devel/lib/libRblas.so
LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

loaded via a namespace (and not attached):
[1] compiler_4.1.0
#
On Tue, 22 Dec 2020, Mark van der Loo wrote:

            
Use this:

     clusterCall(cl, ".libPaths", c(libdir, .libPaths()) )

This will find the function .libPaths on the workers.

Your clusterCall sends across a serialized copy of your process'
.libPaths and calls that. Usually that is equivalent to calling the
function found by the name you used on the workers, but not when the
function has an enclosing environment that the function modifies by
assignment.

Alternate implementations of .libPaths that are more
serialization-friendly are possible in principle but probably not
practical given limitations of the base package.

The distinction between providing a function value or a character
string as the function argument to clusterCall and others could
probably use a paragraph in the help file; happy to consider a patch
if anyone wants to take a crack at it.

Best,

luke

  
    
#
Dear Luke,

Thank you, this makes perfect sense.

I find it quite hard to express this issue in a way that is both compact
and understandable.
In any case, below you find a proposal for an update of the documentation.

Thank you again for all your work,
Mark



Index: src/library/parallel/man/clusterApply.Rd
===================================================================
--- src/library/parallel/man/clusterApply.Rd (revision 79673)
+++ src/library/parallel/man/clusterApply.Rd (working copy)
@@ -136,6 +136,15 @@
   more efficient than \code{parApply} but do less post-processing of the
   result.

+  Functions with a \code{fun} or \code{FUN} parameter send a serialized
+  copy of the argument from the main process to each worker node.
+  When the argument passed to \code{fun} or \code{FUN} is a function
+  this is equivalent to calling the same function on the worker node,
+  except when the function has an enclosing environment it modifies.
+  A notable example is \code{\link{.libPaths}}. To ensure that the
+  function local to each worker is called so it modifies its local
+  enclosing environment, pass the name of the function as a string.
+
   A chunk size of \code{0} with static scheduling uses the default (one
   chunk per node).  With dynamic scheduling, chunk size of \code{0} has the
   same effect as \code{1} (one invocation of \code{FUN}/\code{fun} per
On Tue, Dec 22, 2020 at 2:37 PM <luke-tierney at uiowa.edu> wrote: