Skip to content
Prev 15148 / 21312 Next

[Bioc-devel] IRanges should support long vectors

Hi Herv?,

Indeed, an IRanges with 2^31 elements is 17.1 GB.
The reason I was interested in IRanges, was GRanges are needed to create
the BSgenome::BSgenomeViews.
More broadly, my use case is chopping up a large genome into a fixed kmer
size so that repetitive "unmappable" regions can be removed.
https://github.com/coregenomics/kmap
My interest in long vectors is to address issue #8
https://github.com/coregenomics/kmap/issues/8

The workaround I've imagined so far is to have my kmap::kmerize function
return an iterator that creates GRanges less than length 2^31.
Using iterators doesn't even need any additional packages: they're
implemented in the BiocParallel bpiterator unit tests as returning a
function that keeps returning objects until it returns NULL.

But looking at how much more efficient your GPos, etc functions are,
perhaps maybe BSgenomeViews requiring a GRanges is not as reasonable?
I don't even know of a sane way to mock a BSgenome object for writing
tests.  It's irritating to have to use actual small genomes for tests.

Pariksheet
On Tue, May 28, 2019 at 3:35 AM Pages, Herve <hpages at fredhutch.org> wrote: