Skip to content
Prev 58408 / 63424 Next

mclapply returns NULLs on MacOS when running GAM

On Tue, Apr 28, 2020 at 9:00 PM Shian Su <su.s at wehi.edu.au> wrote:
If you use parLapply(cl, ...) and gives the end-users the control over
the cluster 'cl' object (e.g. via an argument), then they have the
option to choose from the different types of clusters that cl <-
parallel::makeCluster(...) can create, notably PSOCK, FORK and MPI
cluster but the framework support others.

The 'foreach' framework takes this separation of *what* to parallelize
(which you decide as a developer) and *how* to parallel (which the
end-user decides) further by so called foreach adaptors aka parallel
backends.  With foreach, users have plently of doNnn packages to pick
from, doMC, doParallel, doMPI, doSnow, doRedis, and doFuture.  Several
of these parallel backends build on top of the core functions provided
by the 'parallel' package.  So, with foreach your users can use forked
parallel processing if they want and, or something else (selected at
the top of their script).

(Disclaimer: I'm the author) The 'future' framework tries to take this
developer-end-user separation one step further and with a lower level
API - future(), value(), resolved() - for which different parallel
backends have been implemented, e.g. multicore, multisession
("PSOCK"), cluster (any parallel::makeCluster() cluster), callr,
batchtools (HPC job schedulers), etc.  All these have been tested to
conform to the Future API specs, so we know our parallel code works
regardless of which of these backends the user picks.  Now, based on
these basic future low-level functions, other higher level APIs have
been implemented.  For instance, the future.apply packages provides
futurized version of all base R apply functions, e.g. future_lapply(),
future_vapply(), future_Map(), etc.  You can basically take you
lapply(...) code and replace it with future_lapply(...) and things
will just work.  So, try replacing your current mclapply() with
future_lapply().  If you/the user uses the 'multicore' backend - set
by plan(multicore) at top of script, you'll get basically what
mclapply() provides.  If plan(multisession) is used, the you basically
get what parLapply() does.  The difference is that you don't have to
worry about globals and packages.  If you like the foreach-style of
map-reduce, you can use futures via the doFuture backend.  If you like
the purrr-style of map-reduce, you can use the 'furrr' package.  So,
and I'm obviously biased, if you pick the future framework, you'll
leave yourself and end-users with more options going forward.

Clear as mud?

/Henrik

PS. Simon, I think your explicit comment on mcparallel() & friends is
very helpful for many people and developers. It clearly tells
developers to never use mclapply() as the only path through their
code. I'm quite sure not everyone has been or is aware of this. Now
it's clear. Thank you.