On 11/15/12 9:38 PM, "Michael Lawrence" <lawrence.michael at gene.com> wrote:
>On Thu, Nov 15, 2012 at 11:00 AM, Martin Morgan <mtmorgan at fhcrc.org>
>wrote:
>
>> On 11/15/2012 10:53 AM, Henrik Bengtsson wrote:
>>
>>> Is there any write up/discussion/plans on the various types of
>>> parallel computations out there:
>>>
>>> (1) one machine / multi-core/multi-threaded
>>> (2) multiple machines / multiple processes
>>> (3) batch / queue processing (on large compute clusters with many
>>>users).
>>> (4) ...
>>>
>>> Are we/you mainly focusing on (1) and (2)?
>>>
>>
>> open for discussion; 1 & 2 are a good starting point for current scope.
>> r-pbd.org is relevant for 3.
>>
>>
>We have all three of those configurations here, so I've been looking into
>ways to facilitate each of them. One interesting package is BatchJobs. It
>handles simple clusters via ssh, as well as large managed clusters via
>e.g.
>lsf.
>
>
>
>> Not sure how to best facilitate this conversation / prioritization on
>> github? if possible we should move the conversation there.
>>
>> Martin
>>
>>
>>
>>> /Henrik
>>>
>>> On Thu, Nov 15, 2012 at 6:21 AM, Kasper Daniel Hansen
>>> <kasperdanielhansen at gmail.com> wrote:
>>>
>>>> I'll second Ryan's patch (at least in principle). When I parallelize
>>>> across multiple cores, I have always found mc.preschedule to be an
>>>> important option to expose (that, and the number of cores, is all I
>>>> use routinely).
>>>>
>>>> Kasper
>>>>
>>>> On Wed, Nov 14, 2012 at 7:14 PM, Ryan C. Thompson
>>>><rct at thompsonclan.org>
>>>> wrote:
>>>>
>>>>> I just submitted a pull request. I'll add tests shortly if I can
>>>>>figure
>>>>> out
>>>>> how to write them.
>>>>>
>>>>>
>>>>> On Wed 14 Nov 2012 03:50:36 PM PST, Martin Morgan wrote:
>>>>>
>>>>>>
>>>>>> On 11/14/2012 03:43 PM, Ryan C. Thompson wrote:
>>>>>>
>>>>>>>
>>>>>>> Here are two alternative implementations of pvec. pvec2 is just a
>>>>>>> simple rewrite
>>>>>>> of pvec to use mclapply. pvec3 then extends pvec2 to accept a
>>>>>>> specified chunk
>>>>>>> size or a specified number of chunks. If the number of chunks
>>>>>>>exceeds
>>>>>>> the number
>>>>>>> of cores, then multiple chunks will get run sequentially on each
>>>>>>> core. pvec3
>>>>>>> also exposes the "mc.prescheule" argument of mclapply, since that
>>>>>>>is
>>>>>>> relevant
>>>>>>> when there are more chunks than cores. Lastly, I provide a
>>>>>>> "pvectorize" function
>>>>>>> which can be called on a regular vectorized function to make it
>>>>>>>into
>>>>>>> a pvec'd
>>>>>>> version of itself. Usage is like: sqrt.parallel <-
>>>>>>>pvectorize(sqrt);
>>>>>>> sqrt.parallel(1:1000).
>>>>>>>
>>>>>>> pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent =
>>>>>>>FALSE,
>>>>>>> mc.cores = getOption("mc.cores", 2L),
>>>>>>>mc.cleanup =
>>>>>>> TRUE)
>>>>>>> {
>>>>>>> env <- parent.frame()
>>>>>>> cores <- as.integer(mc.cores)
>>>>>>> if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>>>>> if(cores == 1L) return(FUN(v, ...))
>>>>>>>
>>>>>>> if(mc.set.seed) mc.reset.stream()
>>>>>>>
>>>>>>> n <- length(v)
>>>>>>> si <- splitIndices(n, cores)
>>>>>>> res <- do.call(c,
>>>>>>> mclapply(si, function(i) FUN(v[i], ...),
>>>>>>> mc.set.seed=mc.set.seed,
>>>>>>> mc.silent=mc.silent,
>>>>>>> mc.cores=mc.cores,
>>>>>>> mc.cleanup=mc.cleanup))
>>>>>>> if (length(res) != n)
>>>>>>> warning("some results may be missing, folded or caused an
>>>>>>> error")
>>>>>>> res
>>>>>>> }
>>>>>>> pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent =
>>>>>>>FALSE,
>>>>>>> mc.cores = getOption("mc.cores", 2L),
>>>>>>>mc.cleanup =
>>>>>>> TRUE,
>>>>>>> mc.preschedule=FALSE, num.chunks, chunk.size)
>>>>>>> {
>>>>>>> env <- parent.frame()
>>>>>>> cores <- as.integer(mc.cores)
>>>>>>> if(cores < 1L) stop("'mc.cores' must be >= 1")
>>>>>>> if(cores == 1L) return(FUN(v, ...))
>>>>>>>
>>>>>>> if(mc.set.seed) mc.reset.stream()
>>>>>>>
>>>>>>> n <- length(v)
>>>>>>> if (missing(num.chunks)) {
>>>>>>> if (missing(chunk.size)) {
>>>>>>> num.chunks <- cores
>>>>>>> } else {
>>>>>>> num.chunks <- ceiling(n/chunk.size)
>>>>>>> }
>>>>>>> }
>>>>>>> si <- splitIndices(n, num.chunks)
>>>>>>> res <- do.call(c,
>>>>>>> mclapply(si, function(i) FUN(v[i], ...),
>>>>>>> mc.set.seed=mc.set.seed,
>>>>>>> mc.silent=mc.silent,
>>>>>>> mc.cores=mc.cores,
>>>>>>> mc.cleanup=mc.cleanup,
>>>>>>> mc.preschedule=mc.preschedule)**)
>>>>>>> if (length(res) != n)
>>>>>>> warning("some results may be missing, folded or caused an
>>>>>>> error")
>>>>>>> res
>>>>>>> }
>>>>>>>
>>>>>>> pvectorize <- function(FUN) {
>>>>>>> function(...) pvec3(FUN=FUN, ...)
>>>>>>> }
>>>>>>>
>>>>>>
>>>>>>
>>>>>> would be great to have these as 'pull' requests in github; pvec3 as
>>>>>>a
>>>>>> replacement for pvec, if it's implementing the same concept but
>>>>>>better.
>>>>>>
>>>>>> Unit tests would be good (yes being a little hypocritical).
>>>>>> inst/unitTests, using RUnit, examples in
>>>>>>
>>>>>>
>>>>>> https://hedgehog.fhcrc.org/**bioconductor/trunk/madman/**
>>>>>>
>>>>>>Rpacks/IRanges/inst/unitTests<https://hedgehog.fhcrc.org/bioconductor
>>>>>>/trunk/madman/Rpacks/IRanges/inst/unitTests>
>>>>>>
>>>>>>
>>>>>> with username / password readonly
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan
>>>>>>>><mtmorgan at fhcrc.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Interested developers -- I added the start of a BiocParallel
>>>>>>>>> package to
>>>>>>>>> the Bioconductor subversion repository and build system.
>>>>>>>>>
>>>>>>>>> The package is mirrored on github to allow for social coding; I
>>>>>>>>> encourage
>>>>>>>>> people to contribute via that mechanism.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>https://github.com/****Bioconductor/BiocParallel<https://github.co
>>>>>>>>>m/**Bioconductor/BiocParallel>
>>>>>>>>>
>>>>>>>>><http**s://github.com/Bioconductor/**BiocParallel<https://github.c
>>>>>>>>>om/Bioconductor/BiocParallel>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The purpose is to help focus our efforts at developing
>>>>>>>>>appropriate
>>>>>>>>> parallel paradigms. Currently the package Imports: parallel and
>>>>>>>>> implements
>>>>>>>>> pvec and mclapply in a way that allows for operation on any
>>>>>>>>>vector
>>>>>>>>> or list
>>>>>>>>> supporting length(), [, and [[ (the latter for mclapply). pvec in
>>>>>>>>> particular seems to be appropriate for GRanges-like objects,
>>>>>>>>>where
>>>>>>>>> we don't
>>>>>>>>> necessarily want to extract many thousands of S4 instances of
>>>>>>>>> individual
>>>>>>>>> ranges with [[.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Makes sense. Besides, [[ does not even work on GRanges. One
>>>>>>>> limitation of
>>>>>>>> pvec is that it does not support a chunk size; it just uses
>>>>>>>> length(x) /
>>>>>>>> ncores. It would be nice to be able to restrict that, which would
>>>>>>>> then
>>>>>>>> require multiple jobs per core. Unless I'm missing something.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hopefully the ideas in the package can be folded back in to
>>>>>>>>> parallel as
>>>>>>>>> they mature.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>> --
>>>>>>>>> Dr. Martin Morgan, PhD
>>>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>>>> 1100 Fairview Ave. N.
>>>>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>>>>
>>>>>>>>> ______________________________****_________________
>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>https://stat.ethz.ch/mailman/****listinfo/bioc-devel<https://stat.
>>>>>>>>>ethz.ch/mailman/**listinfo/bioc-devel>
>>>>>>>>>
>>>>>>>>><https://**stat.ethz.ch/mailman/listinfo/**bioc-devel<https://stat
>>>>>>>>>.ethz.ch/mailman/listinfo/bioc-devel>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________**_________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>
>>>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.eth
>>>>>>>>z.ch/mailman/listinfo/bioc-devel>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ______________________________**_________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>>
>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.c
>>>>>h/mailman/listinfo/bioc-devel>
>>>>>
>>>>
>>>> ______________________________**_________________
>>>> Bioc-devel at r-project.org mailing list
>>>>
>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch
>>>>/mailman/listinfo/bioc-devel>
>>>>
>>>
>>> ______________________________**_________________
>>> Bioc-devel at r-project.org mailing list
>>>
>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/
>>>mailman/listinfo/bioc-devel>
>>>
>>>
>>
>> --
>> Dr. Martin Morgan, PhD
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioc-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/bioc-devel