[Bioc-devel] RFC: IntervalTrees for GRanges objects
On 2013-04-03 19:28, Kasper Daniel Hansen wrote:
Making IntervalTree chromosome would also be a great addition for organisms with many sequences, like bee (due to an incomplete genome; 10,000s of sequences). It does not matter for humans, but findOverlaps is excruciatingly slow for bee's. I have a couple of posts on this in the archive. I am predicting this will be the case in the future for most non-model organisms; finishing a genome is expensive and time consuming.
I can only confirm: many many people do not close genomes, because for a number of applications this is just not worth the trouble... for now. Given the number of cases I have seen, that's now concerning the very large majority of genomes. L.
Kasper On Wed, Apr 3, 2013 at 12:45 PM, Michael Lawrence <lawrence.michael at gene.com
wrote: Some ideas: - Turn the IntervalTree into a list/array of nodes that can be subset/reordered with shallow copying (just copy the pointers to the nodes), and the index would be secondary. The index in the array could be stored in each node, for lookup during overlap queries. Right now, as far as I can tell, GIntervalTree will get confused if the user reorders e.g. via [. - Make IntervalTree aware of the sequence/chromosome, e.g., have a hash of trees, which is trivial since seqnames is already a factor. Michael On Wed, Apr 3, 2013 at 9:29 AM, Hector Corrada Bravo <hcorrada at gmail.com
wrote: Yep, I didn't comment on that, but I agree that abstracting how GRanges stores ranges would make this more elegant. Right now ranges(GRanges) is specified to be of IRanges class instead of the abstract Ranges class. If it were the latter then GIntervalTree can be a subclass of GenomicRanges, in a similar way that IntervalTree is a subclass of Ranges. On Wed, Apr 3, 2013 at 12:23 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
Hi Hector, That's interesting, thanks for passing this along. I'm still wishing
that
somehow GRanges itself could abstract the way it stores ranges. I know
that
Herve/Patrick had some reasons for depending specifically on GRanges.
One
reason was probably convenience at the C level, but it wouldn't be hard
to
create a Ranges abstraction at the C level, as well. Michael On Tue, Apr 2, 2013 at 5:40 PM, Hector Corrada Bravo <
hcorrada at gmail.com
wrote:
Hello bioc-develers, I'm writing an application where lots findOverlap calls are made on static GRanges objects. For IRanges we can create persistent IntervalTree objects that would serve the multiple overlap query use-case. There is no equivalent for GenomicRanges objects, so I'm proposing an implementation for this. Please check http://github.com/hcorrada/GenomicIntervalTree There's a first cut implementation there you can test by installing this skeleton package. E.g,
library(devtools)
install_github("GenomicIntervalTree", username="hcorrada",
subdir="pkg")
library(GenomicIntervalTree)
Let me know what you think. Cheers, Hector
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel