Skip to content

[Bioc-devel] Re: [Rd] ShortRead::FastqStreamer and parallelization

1 message · Martin Morgan

#
Re-directed from R-devel, where I guess it went by accident.
On 11/18/2014 09:00 AM, Cook, Malcolm wrote:
yes, it's now documented on the FastqStreamer / Sampler and trim* pages.
Yes, individual instances of FastqStreamer (and Sampler) don't benefit from 
R-level parallel evaluation; they both are 'readers' that iterate sequentially 
through the entire file. If you were streaming or sampling from several files 
(as when creating a qa report, where FastqSampler is used 'under the hood'), the 
srapply (or nowadays just BiocParallel::bplapply would distribute the streaming 
/ sampling of each file to a separate process. This would be an effective way of 
managing memory while performing parallel evaluation.

bpiterate could be used effectively with FastqStreamer, if the operation done 
with the chunk of the file were somehow expensive; when processing several files 
it is probably more scalable to parallelize over files, using FastqStreamer to 
manage memory.

Martin