Skip to content
Prev 6301 / 21312 Next

[Bioc-devel] GenomicFiles: chunking

I don't fully understand "the use case for reducing by range is when the
entire dataset won't fit into memory".  The basic assumption of these
functions (as far as I can see) is that the output data fits in memory.
What may not fit in memory is various earlier "iterations" of the data.
For example, in my use case, if I just read in all the data in all the
ranges in all the samples it is basically Rle's across 450MB times 38
files, which is not small.  What fits in memory is smaller chunks of this;
that is true for every application.

Reducing by range (or file) only makes sense when the final output includes
one entity for several ranges/files ... right?  So I don't see how reduce
would help me.

As I see the pack()/unpack() paradigm, it just re-orders the query ranges
(which is super nice and matters a lot for speed for some applications).
As I understand the code (and my understanding is developing) we need an
extra layer to support processing multiple ranges in one operation.

I am happy to help apart from complaining.

Best,
Kasper

On Mon, Sep 29, 2014 at 8:55 AM, Michael Love <michaelisaiahlove at gmail.com>
wrote: