On 12/4/12 5:46 AM, "Henrik Bengtsson" <hb at biostat.ucsf.edu> wrote:
>Picking up this thread in lack of other places (= were should
>BiocParallel be discussed?)
>
>I saw Martin's updates on the BiocParallel - great. Florian's SGE
>scheduler was also mentioned; is that one built on top of BatchJobs?
>If so I'd be interested in looking into that/generalizing that to work
>with any BatchJobs scheduler.
>
>I believe there is going to be a new release of BatchJobs rather soon,
>so it's probably worth waiting until that is available.
>
>The main use case I'm interested in is to launch batch jobs on a
>PBS/Torque cluster, and then use multicore processing on each compute
>node. It would be nice to be able to do this using the BiocParallel
>model, but maybe it is too optimistic to get everything to work under
>same model. Also, as Vince hinted, fault tolerance etc needs to be
>addressed and needs to be addressed differently in the different
>setups.
>
>/Henrik
>
>On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte <rdiaz02 at gmail.com>
>wrote:
>>
>>
>>
>> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson"
>><rct at thompsonclan.org> wrote:
>>
>>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote:
>>> > In addition to Steve's comment, is it really a good thing that "all
>>>code
>>> > stays the same."? I mean, multiple machines vs. multiple cores are,
>>> > often, _very_ different things: for instance, shared vs. distributed
>>> > memory, communication overhead differences, whether or not you can
>>>assume
>>> > packages and objects to be automagically present in the slaves/child
>>> > process, etc. So, given they are different situations, I think it
>>> > sometimes makes sense to want to write different code for each
>>>situation
>>> > (I often do); not to mention Steve's hybrid cases ;-).
>>> >
>>> >
>>> > Since BiocParallel seems to be a major undertaking, maybe it would be
>>> > appropriate to provide a flexible approach, instead of hard wiring
>>>the
>>> > foreach approach.
>>> Of course there are cases where the same code simply can't work for
>>>both
>>> multicore and multi-machine situations, but those generally don't fall
>>> into the category of things that can be done using lapply. Lapply and
>>> all of its parallelized buddies like mclapply, parLapply, and foreach
>>> are designed for data-parallel operations with no interdependence
>>> between results, and these kinds of operations generally parallelize as
>>> well across machines as across cores, unless your network is not fast
>>> enough (in which case you would choose not to use multi-machine
>>> parallelism). If you want a parallel algorithm for something like the
>>> disjoin method of GRanges, you might need to write some special purpose
>>> code, and that code might be very different for multicore vs
>>>multi-machine.
>>
>>> So yes, sometimes there is a fundamental reason that you have to change
>>> the code to make it run on multiple machines, and neither foreach nor
>>> any other parallelization framework will save you from having to
>>>rewrite
>>> your code. But often there is no fundamental reason that the code has
>>>to
>>> change, but you end up changing it anyway because of limitations in
>>>your
>>> parallelization framework. This is the case that foreach saves you
>>>from.
>>
>>
>>
>> Hummm... I guess you are right, and we are talking about "often" or
>>"most
>> of the time", which is where all this would fit. Point taken.
>>
>>
>> Best,
>>
>> R.
>>
>>
>>
>>
>>
>>
>> --
>> Ramon Diaz-Uriarte
>> Department of Biochemistry, Lab B-25
>> Facultad de Medicina
>> Universidad Aut?noma de Madrid
>> Arzobispo Morcillo, 4
>> 28029 Madrid
>> Spain
>>
>> Phone: +34-91-497-2412
>>
>> Email: rdiaz02 at gmail.com
>> ramon.diaz at iib.uam.es
>>
>> http://ligarto.org/rdiaz
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel