[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?
On 11/04/2013 11:34 AM, Michael Lawrence wrote:
The dynamic nature of R limits the extent of these checks. But as Ryan has noted, a simple sanity check goes a long way. If what he has done could be extended to the rest of the search path (people always forget to attach packages), I think we've hit the 80% with 20%. Got a 404 on that URL btw.
I added three issues to BiocParallel on github. 1. bpexport 2. a function to check for non-local use. I think this should use codetools (to avoid adding additional dependencies) but I'm a little flexible. Contributions welcome on github, especially as a pull request with code formatted consistently, a man page, and especially unit tests to provide a clear understanding of circumstances covered or not. Michel Lang's Recall and the implementation in foreach also sound releavant here. 3. integration of (2) into bplapply etc. Please feel free to address these further on github. Martin
Michael On Mon, Nov 4, 2013 at 11:05 AM, Gabriel Becker <gmbecker at ucdavis.edu>wrote:
Hey guys, Here is code that I have written which resolves library names into a full list of symbols: https://github.com/duncantl/CodeDepends/blob/forCRAN_0.3.5/R/librarySymbols.RNote this does not require that the packages actually be loaded at the time of the check, and does not load them (or rather, it loads them but does not attach them, so no searchpath muddying occurs). You do need a list of packages to check though (it adds the base ones automatically). It handles dependency and could be easily extended to handle suggests as well I think. When CodeDepends gets pushed to cran (not my call and not high on my priority list to push for currently) it will actually do exactly what you want. (the forCRAN_0.3.5 branch already does and I believe it is documented, so you could use devtools to install it now). As a side note, I'm not sure that existence of a symbol is sufficient (it certainly is necessary). What about situations where the symbol exists but is stale compared to the value in the parent? Are we sure that can never happen? ~G On Mon, Nov 4, 2013 at 7:29 AM, Michel Lang <michellang at gmail.com> wrote:
You might want to consider using Recall() for recursion which should
solve
this. Determining the required variables using heuristics as codetools
will
probably lead to some confusion when using functions which include calls
to, e.g., with():
f = function() {
with(iris, Sepal.Length + Sepal.Width)
}
codetools:::findGlobals(f)
I would suggest to write up some documentation on what the function's
environment contains and how to to define variables accordingly - or why
it
can generally be considered a good idea to pass everything essential as
an
argument. Nevertheless a "bpExport" function would be a good addition for some rare corner cases in my opinion. Michel 2013/11/3 Henrik Bengtsson <hb at biostat.ucsf.edu>
Hi,
in BiocParallel, is there a suggested (or planned) best standards for
making *locally* assigned variables (e.g. functions) available to the
applied function when it runs in a separate R process (which will be
the most common use case)? I understand that avoid local variables
should be avoided and it's preferred to put as mush as possible in
packages, but that's not always possible or very convenient.
EXAMPLE:
library('BiocParallel')
library('BatchJobs')
# Here I pick a recursive functions to make the problem a bit harder,
i.e.
# the function needs to call itself ("itself" = see below)
fib <- function(n=0) {
if (n < 0) stop("Invalid 'n': ", n)
if (n == 0 || n == 1) return(1)
fib(n-2) + fib(n-1)
}
# Executing in the current R session
cluster.functions <- makeClusterFunctionsInteractive()
bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
# Executing in a separate R process, where fib() is not defined
# (not specific to BiocParallel)
cluster.functions <- makeClusterFunctionsLocal()
bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
Error in LastError$store(results = results, is.error = !ok,
throw.error =
TRUE)
:
Errors occurred during execution. First error message:
Error in FUN(...): could not find function "fib"
[...]
# The following illustrates that the solution is not always
straightforward.
# (not specific to BiocParallel; must have been discussed previously)
values <- bplapply(0:9, FUN=function(n, fib) {
fib(n)
}, fib=fib)
Error in LastError$store(results = results, is.error = !ok,
throw.error = TRUE) :
Errors occurred during execution. First error message:
Error in fib(n): could not find function "fib"
[...]
# Workaround; make fib() aware of itself
# (this is something the user need to do, and would be very
# hard for BiocParallel et al. to automate. BTW, should all
# recursive functions be implemented this way?).
fib <- function(n=0) {
if (n < 0) stop("Invalid 'n': ", n)
if (n == 0 || n == 1) return(1)
fib <- sys.function() # Make function aware of itself
fib(n-2) + fib(n-1)
}
values <- bplapply(0:9, FUN=function(n, fib) {
fib(n)
}, fib=fib)
WISHLIST:
Considering the above recursive issue solved, a slightly more explicit
and standardized solution is then:
values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
fib(n)
}, BPGLOBALS=list(fib=fib))
Could the above be generalized into something as neat as:
bpExport("fib")
values <- bplapply(0:9, FUN=function(n) {
BiocParallel::bpImport("fib")
fib(n)
})
or ideally just (analogously to parallel::clusterExport()):
bpExport("fib")
values <- bplapply(0:9, FUN=fib)
/Henrik
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793