Would it be possible to make mclapply aware of the Vector class?
Currently, the following line causes them to be coerced into a regular
list which could be rather expensive for instance in the case of
GrangesLists:
if (!is.vector(X) || is.object(X))
X <- as.list(X)
I guess something like
if ((!is.vector(X) && !is(X, "Vector")) || is.object(X))
X <- as.list(X)
would do the trick.
Or am I missing something obvious here?
Cheers,
Florian
[Bioc-devel] mclapply and Vector objects
9 messages · Hahne, Florian, Michael Lawrence, Vincent Carey +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120924/d7a72338/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120924/79ecc777/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120924/c998f7e0/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120924/a2bc5692/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120924/402933d5/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120925/62d16e3a/attachment.pl>
-----Original Message-----
From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-project.org] On Behalf Of Michael Lawrence
Sent: Monday, September 24, 2012 1:43 PM
To: Vincent Carey
Cc: Michael Lawrence; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] mclapply and Vector objects
Good points, Vince. Yes, it would be great to use the more general
parLapply and expect the client to somehow pass a cluster object that
controls the parallelization strategy. The mclapply function is a little
bit special in that it is able to inherit the enclosing environment from
the parent process. This is not generally feasible. So it seems OK to have
an mclapply wrapper that makes this assumption clear in a way that is
easier to read (in my opinion) from a parameterized apply function
accepting a closure, cluster object, or whatever.
We actually don't have to do too much copying of the formals to the
wrapper, because we can simply help mclapply find our as.list generic:
.mclapplyDefault <- parallel::mclapply
environment(.mclapplyDefault) <- topenv()
setMethod("mclapply", "List", .mclapplyDefault)
Michael, I'm close to presenting something that i think will address all concerns, and am review this small flurry of emails to make sure. Would you please give me a quick explanation of that the above 'trick' accomplishes. I'm still knocking around R's OO frameworks. Also...
Tricks aside, mclapply has some issues that would be nice to address. The main one for me is error handling. I have developed some code that wraps the user function in try() and compiles any errors into a special "batchCondition" object that provides access to the individual exceptions and gives a nice summary of what went wrong. Ideally, that would go into parallel, but we could put it in IRanges as a stop gap.
mclapply currently returns a vector of individual error messages as value if any process. Are you suggesting perhaps mclapply should, in such cases, instead of warn, rather `stop` with a condition (batchCondition ), providing as additional attributes to the condition the vector conditions (try-errors) returned from each process.
If so, yeah, yeah, I like.... sounds great.... but would be a change in behaviour to mclapply.... though probably an improvement that no-one would object to.
What we have now, like this:
x<-try(mclapply(1:5,simpleError('Hey!'),mc.silent=TRUE))
Warning message:
In mclapply(1:5, stop, simpleError("Hey!"), mc.silent = TRUE) :
all scheduled cores encountered errors in user code
class(x)
[1] "list"
class(x[[1]])
[1] "character"
x[[1]]
[1] "Error in lapply(X=S, FUN=FUN,...) : 1 Error: I warned You\n\n"
Instead, we are saying, mclapply should raise a batchError which would have as an attribute each individual core's simpleError, say, 'jobCondition'
What could be simpler?
It would then look like this:
x<-try(mclapply(1:5,simpleError('Hey!'),mc.silent=TRUE))
Error:
In x<-try(mclapply(1:5,simpleError('Hey!'),mc.silent=TRUE))
all scheduled cores encountered errors in user code
class(x)
[1] "batchError" "simpleError" "error" "condition"
x
[1] Error:
In mclapply(1:5, stop, simpleError("Hey!"), mc.silent = TRUE) :
all scheduled cores encountered errors in user code
attr(,"class")
[1] "batchError" "simpleError" "error" "condition"
attr(,"condition")
<batchError all scheduled cores encountered errors in user code>
attr(,"jobCondition")
[,1] [,2] [,3] [,4] [,5]
jobid 1 2 3 4 5
message Hey! Hey! Hey! Hey! Hey!
call NULL NULL NULL NULL NULL
This would be sweet and simple.
~Malcolm
Michael On Mon, Sep 24, 2012 at 7:12 AM, Vincent Carey <stvjc at channing.harvard.edu>wrote:
only caveat is: do we want to commit to mc* in the interface or remain agnostic and allow iterator selection to be dropped in? i looked at the commented out mcseqapply and it seems unfortunate to manually propagate all the mc.* options so what am i suggesting? i myself had to wonder. interactively, i generally use mclapply(1:N, function(ind) ...) to do get multicore processing for general objects, and when i want a higher-level function that allows users to choose for or against multicore iteration, define an applier parameter that defaults to lapply ... if you have to set options it is probably OK to do that through a closure, if you don't want to have all those potentially unstable parameters cluttering your arg list. so my proposal is: whatever we choose, plan for alternative approaches to multicore execution, and keep the code base slim by allowing the alternatives to be chosen through parameter settings as opposed to distinct interfaces On Mon, Sep 24, 2012 at 9:36 AM, Michael Lawrence < lawrence.michael at gene.com> wrote:
I should amend this: it would be a method for the List class. Many of the Vector classes are "atomic" and coercing them to a list is either not supported or may yield an undesired result. For example, coercing an IRanges to a list yields a list of integer vectors with the sequence from start to end. We don't have an lapply,Vector for this reason. I actually already made a commented-out mcseqapply. I think I aborted the mc* stuff back before the parallel package existed, just to avoid adding a dependency on multicore. With parallel in base R, it's reasonable to add these methods. If no one else complains, I'll move ahead. Michael On Mon, Sep 24, 2012 at 6:23 AM, Michael Lawrence <michafla at gene.com> wrote:
It definitely makes sense to have a generic for mclapply that dispatches on Vector. Perhaps also for some of the other apply functions in the parallel package. Michael On Mon, Sep 24, 2012 at 3:58 AM, Hahne, Florian < florian.hahne at novartis.com> wrote:
Would it be possible to make mclapply aware of the Vector class?
Currently, the following line causes them to be coerced into a regular
list which could be rather expensive for instance in the case of
GrangesLists:
if (!is.vector(X) || is.object(X))
X <- as.list(X)
I guess something like
if ((!is.vector(X) && !is(X, "Vector")) || is.object(X))
X <- as.list(X)
would do the trick.
Or am I missing something obvious here?
Cheers,
Florian
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20120926/214cb304/attachment.pl>