On 10/25/12 7:36 PM, "Vincent Carey" <stvjc at channing.harvard.edu> wrote:
>if R-core (who afaics maintain parallel) are unwilling to adopt/maintain
>these suggestions, why not write a biocParallel and/or cookParallel
>package
>that does? it seems to me that any interested party can pose the issue to
>r-devel. if no answer is given, we can all learn from the experimental
>alternate package.
>
>On Thu, Oct 25, 2012 at 12:44 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>
>>
>>
>> On 10/24/12 5:08 PM, "Herv? Pag?s" <hpages at fhcrc.org> wrote:
>>
>> >Hi,
>> >
>> >With Florian use case, there seems to be a strong/immediate need for
>> >dispatching on the cluster-like object passed as the 1st argument to
>> >parLapply() and all the other functions in the parallel package that
>> >belong to the "snow family" (14 functions in total, all documented in
>> >?parallel::parLapply). So we've just added those 14 generics to
>> >BiocGenerics 0.5.1. We're postponing the "multicore family" (i.e.
>> >mclapply(), mcmapply(), and pvec()) for now.
>> >
>> >Note that the 14 new generics dispatch at least on their 1st argument
>> >('cl'), but also on their 2nd argument when this argument is 'x', 'X'
>> >or 'seq' (expected to be a vector-like or matrix-like object). This
>> >opens the door to defining methods that take advantage of the of the
>> >implementation of particular vector-like or matrix-like objects.
>> >
>> >Also note that, even if some of the 14 functions in the "snow family"
>> >are simple convenience wrappers to other functions in the family, we've
>> >made all of them generics. For example clusterEvalQ() is a simple
>> >wrapper to clusterCall():
>> >
>> > > clusterEvalQ
>> > function (cl = NULL, expr)
>> > clusterCall(cl, eval, substitute(expr), env = .GlobalEnv)
>> > <environment: namespace:parallel>
>> >
>> >And it seems (at least intuitively) that implementing a "clusterCall"
>> >method for my cluster-like objects should be enough to have
>> >clusterEvalQ() work out-of-the-box on those objects. But, sadly enough,
>> >this is not the case:
>> >
>> > setClass("FakeCluster", representation(nnodes="integer"))
>> >
>> > setMethod("clusterCall", "FakeCluster",
>> > function (cl=NULL, fun, ...) fun(...)
>> > )
>> >
>> >Then:
>> >
>> > > mycluster <- new("FakeCluster", nnodes=10L)
>> > > clusterCall(mycluster, print, 1:6)
>> > [1] 1 2 3 4 5 6
>> > > clusterEvalQ(mycluster, print(1:6))
>> > Error in checkCluster(cl) : not a valid cluster
>> >
>> >This is because the "clusterEvalQ" default method is calling
>> >parallel::clusterCall() (which is *not* the generic), instead of
>> >calling BiocGenerics::clusterCall() (which *is* the generic).
>> >
>> >This would be avoided if clusterCall() was a generic defined in
>> >the parallel package itself (or in a package that parallel depends
>> >on). And this would of course be a better solution than having those
>> >generics in BiocGenerics. Is someone willing to bring that case to
>> >R-devel?
>> >
>> >In the mean time I need to define a "clusterEvalQ" method:
>> >
>> > setMethod("clusterEvalQ", "FakeCluster",
>> > function (cl=NULL, expr)
>> > clusterCall(cl, eval, substitute(expr), env=.GlobalEnv)
>> > )
>> >
>> >And then:
>> >
>> > > clusterEvalQ(mycluster, print(1:6))
>> > [1] 1 2 3 4 5 6
>> >
>> >Finally note that this method I defined for my objects could be made
>>the
>> >default "clusterEvalQ" method (i.e. the clusterEvalQ,ANY method) and we
>> >could put it in BiocGenerics. Or, since there is apparently nothing to
>> >win by having clusterEvalQ() being a generic in the first place, we
>> >could redefine clusterEvalQ() as an ordinary function in BiocGenerics.
>> >This function would be implemented *exactly* like
>> >parallel::clusterEvalQ() (and it would mask it), except that now
>> >it would call BiocGenerics::clusterCall() internally.
>> >
>> >What should we do?
>>
>> We have the identical problem already when we try to use parallel
>>mcmapply
>> on a BioC List (i.e. GRangesList).
>>
>> Witness:
>>
>> The casual user (ehrm, myself at least) expects that since I can
>>'lapply'
>> on a BioC GRangesList (or any other List) that I should be able to
>> mclapply on it.
>>
>> Sadly the casual user is wrong, and gets an error.
>>
>> Why?
>>
>> Because parallel::mclapply(X... calls as.list on X.
>>
>> Which yields 'Error in as.list.default : no method for coercing this S4
>> class to a vector'
>>
>> But, you say, IRanges defines as.list for Lists, as can be demonstrated
>>by
>> calling as.list(myGRL) on a GRangesList.
>>
>> Here I yield the floor to someone who can explain why this is so, for I
>> have not studied enough how namespaces/packages/symboltables/whatever
>>work
>> in R.
>>
>> Anyone?
>>
>> Regardless, one BAD workaround I found works is to snarf (tm) the source
>> for mclapply, evaluate it in the global namespace, after prefixing all
>> parallel internal functions with 'parallel:::'.
>>
>> AFter doing this, the modified mclapply works as one might expect.
>>
>> So, there is at least an issue regarding how method dispatch works
>>across
>> namespaces. Again I yield the floor, but, expect that it can be fixed.
>>
>> BUT, FURTHERMORE, MCLAPPLY SHOULD NOT COERCE X TO LIST ANYWAY
>>
>> Why? Because calling `as.list` incurs the overhead of (needlessly!?!)
>> coercing this nice tight GRangesList into a base::list.
>>
>> There is NO REASON for it to be coercing X to a list at all. By my
>> lights, mclapply only needs `length` and `seq_along` defined on X, which
>> ARE ALREADY available to a GRangesList from Vector. Indeed, commenting
>> out the X<-as.list(X) coercion in mclapply and, lo, it still works on a
>> GRangesList as hoped, and on a 1000 element GRanges list takes ~18x less
>> user time to mclapply(myGRL,length). (and even short just to use
>> elementLengths, but that is not the point).
>>
>> In this case the solution appears to be to FIX the upstream package so
>> that method dispatch works correctly (I expect that length and seq_along
>> are only visible to my snarfed mclapply and would suffer from similar
>> error without adressing the package issue).
>>
>> Indeed, similarly, in my proposed changed to parallel::pvec, I found a
>> simple change that made it work with Vector as well as vector, since
>> Vector implements `[` and `length`.
>>
>> I still think the solution to getting an SGE (et. al.) parallel back-end
>> is to seek to improve the upstream package to make 'pluggable' for
>> different parallel backends.
>>
>> I don't think I'm the right person to represent this to R-devel as
>> obviously I am not schooled (yet!?!?) in the workings of
>> S3/S4/signatures/methods/etc.
>>
>> Herve, I have a hunch that your 'In the mean time' solution is a
>> workaround that has the potential to invite further confusion.
>>
>> Anyone, as a perhaps related issue, and as an opportunity to educate me,
>> can you explain why untrace does NOT completely work on `lapply` (with
>> BiocGenerics loaded). Viz:
>>
>> trace(lapply)
>> untrace(lapply)
>> IRanges(1,2)
>> IRanges of length 1
>> trace: lapply(dots, methods:::.class1)
>> ....
>>
>>
>> --Malcolm
>>
>>
>>
>>
>>
>>
>> >
>> >H.
>> >
>> >
>> >On 10/24/2012 09:07 AM, Cook, Malcolm wrote:
>> >> On 10/24/12 12:44 AM, "Michael Lawrence" <lawrence.michael at gene.com>
>> >>wrote:
>> >>
>> >>> I agree that it would fruitful to have parLapply in BiocGenerics. It
>> >>>looks
>> >>> to be a flexible abstraction and its presence in the parallel
>>package
>> >>> makes
>> >>> it ubiquitous. If it hasn't been done already, mclapply (and
>>mcmapply)
>> >>> would be good candidates, as well. The fork-based parallelism is
>> >>> substantively different in terms of the API from the more general
>> >>> parallelism of parLapply.
>> >>>
>> >>> Someone was working on some more robust and convenient wrappers
>>around
>> >>> mclapply. Did that ever see the light of day?
>> >>
>> >>
>> >> If you are referring to
>> >>
>> >>
>>
>>http://thread.gmane.org/gmane.science.biology.informatics.conductor/43660
>> >>
>> >> in which I had offered some small changes to parallel::pvec
>> >>
>> >> https://gist.github.com/3757873/
>> >>
>> >> and after which Martin had provided me with a route I have not (yet?)
>> >> followed toward submitting a patch to R for consideration by R-devel
>>/
>> >> Simon Urbanek in
>> >>
>> >>
>> >>
>>
>>http://grokbase.com/t/r/bioc-devel/129rbmxp5b/applying-over-granges-and-o
>> >>th
>> >> er-vectors-of-ranges#201209248dcn0tpwt7k7g9zsjr4dha6f1c
>> >>
>> >>
>> >>
>> >>
>> >>>>> On Tue, Oct 23, 2012 at 12:13 PM, Steve Lianoglou <
>> >>>>> mailinglist.honeypot at gmail.com**> wrote:
>> >>>>>
>> >>>>> In response to a question from yesterday, I pointed someone to
>>the
>> >>>>>> ShortRead `srapply` function and I wondered to myself why it had
>>to
>> >>>>>> necessarily by "burried" in the ShortRead package (aside from it
>> >>>>>> having a `sr` prefix).
>> >>>>>>
>> >>>>>
>> >>>> I don't know that srapply necessarily 'got it right'...
>> >>
>> >>
>> >> One thing I like about srapply is its support for a reduce argument.
>> >>
>> >>>>>> I had thought it might be a good idea to move that (or something
>> >>>>>>like
>> >>>>>> that) to BiocGenerics (unless implementations aren't allowed
>>there)
>> >>>>>> but also realized that it would add more dependencies where
>>someone
>> >>>>>> might not necessarily need them.
>> >>
>> >>
>> >>>>>>
>> >>>>>> But, almost surely, a large majority of the people will be happy
>>to
>> >>>>>>do
>> >>>>>> some form of ||-ization, so in my mind it's not such an onerous
>> >>>>>>thing
>> >>>>>> to add -- on the other hand, this large majority is probably
>> >>>>>>enriched
>> >>>>>> for people who are doing NGS analysis, in which case, keeping it
>>in
>> >>>>>> ShortRead can make some sense.
>> >>
>> >> I remain confused about the need for putting any of this into
>> >>BiocGenerics
>> >> at all. It seems to me that properly construed parallization
>>primitives
>> >> ought to 'just work' with any object which supports indexing and
>>length.
>> >>
>> >> I would appreciate hearing arguments to the contrary.
>> >>
>> >> Florian, in a similar vein, could we not seek to change
>> >> parallel::makeCluster to be extensible to, say, support SGE cluster?
>> >>THis
>> >> seems like the 'right thing to do'. ???
>> >>
>> >>
>> >> Regardless, I think we have raised some considerations that might
>>inform
>> >> improvements to parallel, including points about error handling,
>> >>reducing
>> >> results, block-level parallization over List/Vector (in addition to
>> >> vector), etc.
>> >>
>> >> I think perhaps having a google doc that we can collectively edit to
>> >> contain the requirements we are trying to achieve might move us
>>forward
>> >> effectively. Would this help? Or perhaps a page under
>> >> http://wiki.fhcrc.org/bioc/DeveloperPage/#discussions ???
>> >>
>> >>
>> >>>>>> Taking one step back, I recall some chatter last week (or two)
>>about
>> >>>>>> some better ||-ization "primitives" -- something about a pvec
>> >>>>>>doo-dad,
>> >>>>>> and there being ideas to wrap different types of ||-ization
>>behind
>> >>>>>>an
>> >>>>>> easy to use interface (I think this was the convo), and then I
>>took
>> >>>>>>a
>> >>>>>> further step back and often wonder why we just don't bite the
>>bullet
>> >>>>>> and take advantage of the `foreach` infrastructure that is
>>already
>> >>>>>>out
>> >>>>>> there -- in which case, I could imagne a "doSGE" package that
>>might
>> >>>>>> handle the particulars of what Florain is referring to. You could
>> >>>>>>then
>> >>>>>> configure it externally via some
>> >>>>>>`registerDoSGE(some.config.**object)`
>> >>>>>> and just have the package code happily run it through
>>`foreach(...)
>> >>>>>> %dopar%` and be done w/ it.
>> >>>>>>
>> >>>>>>
>> >>>>>> IMHO it is relevant. I have not looked for other abstractions,
>> >>>>>>and
>> >>>>>> this
>> >>>>> one seems
>> >>>>> to work. Florian's objectives might be a good test case for
>> >>>>>adequacy.
>> >>>>>
>> >>>>
>> >>>> The registerDoDah does seem to be a useful abstraction.
>> >>
>> >> Is this not more-or-less the intention of
>>parallel::setDefaultCluster?
>> >>
>> >> --Malcolm
>> >>
>> >>
>> >>
>> >>>>
>> >>>> I think there's a lot of work to do for some sort of coordinated
>> >>>> parallelization that putting parLapply into BiocGenerics might
>> >>>> encourage;
>> >>>> not good things will happen when everyone in a call stack tries to
>> >>>> parallelize independently. But I'm in favor of parLapply in
>> >>>> BiocGenerics at
>> >>>> least for the moment.
>> >>>>
>> >>>> Martin
>> >>>>
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> ... at least, I thought this is what was being talked about here
>> >>>>>(and
>> >>>>>> popped up a week or two ago) -- sorry if I completely missed the
>> >>>>>>mark
>> >>>>>> ...
>> >>>>>>
>> >>>>>> -steve
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Oct 23, 2012 at 10:38 AM, Hahne, Florian
>> >>>>>> <florian.hahne at novartis.com> wrote:
>> >>>>>>
>> >>>>>>> Hi Martin,
>> >>>>>>> I could define the generics in my own package, but that would
>>mean
>> >>>>>>> that
>> >>>>>>> those will only be available there, or in the global environment
>> >>>>>>> assuming
>> >>>>>>> that I also export them, or in all additional packages that
>> >>>>>>> explicitly
>> >>>>>>> import them from my name space. Now there already are a whole
>>bunch
>> >>>>>>> of
>> >>>>>>> packages around that all allow for parallelization via a cluster
>> >>>>>>> object.
>> >>>>>>> Obviously those all import the parLapply function from the
>>parallel
>> >>>>>>> package. That means that I can't simply supply my own modified
>> >>>>>>> cluster
>> >>>>>>> object, because the code that calls parLapply will not know
>>about
>> >>>>>>>the
>> >>>>>>> generic in my package, even if it is attached. Ideally parLapply
>> >>>>>>> would
>> >>>>>>> be
>> >>>>>>> a generic function already in the parallel package. Not sure who
>> >>>>>>> needs
>> >>>>>>> to
>> >>>>>>> be convinced in order for this to happen, but my gut feeling was
>> >>>>>>> that it
>> >>>>>>> could be easier to have the generic in BiocGenerics.
>> >>>>>>> Maybe I am missing something obvious here, but imo there is no
>>way
>> >>>>>>>to
>> >>>>>>> overwrite parLapply globally for my own class unless the
>>generic is
>> >>>>>>> imported by everyone who wants to make use of the special
>>method.
>> >>>>>>> Florian
>> >>>>>>> --
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 10/23/12 2:20 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
>> >>>>>>>
>> >>>>>>> On 10/17/2012 05:45 AM, Hahne, Florian wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all,
>> >>>>>>>>> I was wondering whether it would be possible to have proper
>> >>>>>>>>> generics
>> >>>>>>>>>
>> >>>>>>>> for
>> >>>>>>
>> >>>>>>> some of the functions in the parallel package, e.g. parLapply
>>and
>> >>>>>>>>> clusterCall. The reason I am asking is because I want to
>>build an
>> >>>>>>>>> S4
>> >>>>>>>>> class
>> >>>>>>>>> that essentially looks like an S3 cluster object but knows
>>how to
>> >>>>>>>>> deal
>> >>>>>>>>> with the SGE. That way I can abstract away all the overhead
>> >>>>>>>>> regarding
>> >>>>>>>>> job
>> >>>>>>>>> submission, job status and reducing the results in the
>>parLapply
>> >>>>>>>>> method
>> >>>>>>>>> of
>> >>>>>>>>> that class, and would be able to supply this new cluster
>>object
>> >>>>>>>>>to
>> >>>>>>>>> all
>> >>>>>>>>> of
>> >>>>>>>>> my existing functions that can be processed in parallel using
>>a
>> >>>>>>>>> cluster
>> >>>>>>>>> object as input. I have played around with the BatchJobs
>>package
>> >>>>>>>>> as an
>> >>>>>>>>> abstraction layer to SGE and that work nicely. As a test case
>>I
>> >>>>>>>>> have
>> >>>>>>>>> created the necessary generics myself in order to supply my
>>own
>> >>>>>>>>> SGEcluster
>> >>>>>>>>> object to a function that normally deals with the "regular"
>> >>>>>>>>> parallel
>> >>>>>>>>> package S3 cluster objects and everything just worked out of
>>the
>> >>>>>>>>> box,
>> >>>>>>>>> but
>> >>>>>>>>> obviously this fails once I am in a name space and my generic
>>is
>> >>>>>>>>> not
>> >>>>>>>>> found
>> >>>>>>>>> anymore. Of course what we would really want is some proper
>> >>>>>>>>> abstraction
>> >>>>>>>>> of
>> >>>>>>>>> parallelization in R, but for now this seem to be at least a
>> >>>>>>>>>cheap
>> >>>>>>>>> compromise. Any thoughts on this?
>> >>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Hi Florian -- we talked about this locally, but I guess we
>>didn't
>> >>>>>>>> actually send
>> >>>>>>>> any email!
>> >>>>>>>>
>> >>>>>>>> Is there an obstacle to promoting these to generics in your own
>> >>>>>>>> package?
>> >>>>>>>> The
>> >>>>>>>> usual motivation for inclusion in BiocGenerics has been to
>>avoid
>> >>>>>>>> conflicts
>> >>>>>>>> between packages, but I'm not sure whether this is the case
>>(yet)?
>> >>>>>>>> This
>> >>>>>>>> would
>> >>>>>>>> also add a dependency fairly deep in the hierarchy.
>> >>>>>>>>
>> >>>>>>>> What do you think?
>> >>>>>>>>
>> >>>>>>>> Martin
>> >>>>>>>>
>> >>>>>>>> Florian
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>> >>>>>>>> 1100 Fairview Ave. N.
>> >>>>>>>> PO Box 19024 Seattle, WA 98109
>> >>>>>>>>
>> >>>>>>>> Location: Arnold Building M1 B861
>> >>>>>>>> Phone: (206) 667-2793
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> ______________________________**_________________
>> >>>>>>> Bioc-devel at r-project.org mailing list
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<
>> https://stat.ethz
>> >>>>>>>.c
>> >>>>>>> h/mailman/listinfo/bioc-devel>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Steve Lianoglou
>> >>>>>> Graduate Student: Computational Systems Biology
>> >>>>>> | Memorial Sloan-Kettering Cancer Center
>> >>>>>> | Weill Medical College of Cornell University
>> >>>>>> Contact Info:
>> >>>>>>
>> >>>>>>http://cbio.mskcc.org/~lianos/**contact<
>> http://cbio.mskcc.org/%7Elian
>> >>>>>>os
>> >>>>>> /contact>
>> >>>>>>
>> >>>>>> ______________________________**_________________
>> >>>>>> Bioc-devel at r-project.org mailing list
>> >>>>>>
>> >>>>>>
>>
>>>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.eth
>>>>>>>>z
>> .
>> >>>>>>ch
>> >>>>>> /mailman/listinfo/bioc-devel>
>> >>>>>>
>> >>>>>>
>> >>>>> [[alternative HTML version deleted]]
>> >>>>>
>> >>>>> ______________________________**_________________
>> >>>>> Bioc-devel at r-project.org mailing list
>> >>>>>
>> >>>>>
>> >>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<
>> https://stat.ethz.c
>> >>>>>h/
>> >>>>> mailman/listinfo/bioc-devel>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> Computational Biology / Fred Hutchinson Cancer Research Center
>> >>>> 1100 Fairview Ave. N.
>> >>>> PO Box 19024 Seattle, WA 98109
>> >>>>
>> >>>> Location: Arnold Building M1 B861
>> >>>> Phone: (206) 667-2793
>> >>>>
>> >>>> ______________________________**_________________
>> >>>> Bioc-devel at r-project.org mailing list
>> >>>>
>> >>>>
>> >>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<
>> https://stat.ethz.ch
>> >>>>/m
>> >>>> ailman/listinfo/bioc-devel>
>> >>>>
>> >>>
>> >>> [[alternative HTML version deleted]]
>> >>>
>> >>> _______________________________________________
>> >>> Bioc-devel at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >> _______________________________________________
>> >> Bioc-devel at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >
>> >--
>> >Herv? Pag?s
>> >
>> >Program in Computational Biology
>> >Division of Public Health Sciences
>> >Fred Hutchinson Cancer Research Center
>> >1100 Fairview Ave. N, M1-B514
>> >P.O. Box 19024
>> >Seattle, WA 98109-1024
>> >
>> >E-mail: hpages at fhcrc.org
>> >Phone: (206) 667-5791
>> >Fax: (206) 667-1319
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> [[alternative HTML version deleted]]
>