Skip to content
Prev 8269 / 21312 Next

[Bioc-devel] is.unsorted method for GRanges objects

OK. Thanks Pete for the timings. The fact that the relative difference
in speed is larger for small n in your brief tests is because one
performs roughly in n*log(n) (quicksort-based) and the other one is
linear in time. Which is why I assumed (but without doing any testing)
that the latter was going to perform better. Anyway it seems that there
is just too much overhead involved in that solution to make it a good
candidate.

So back to square one and to the business of trying to come up with
something even more efficient than is.unsorted(order(x)) for
GenomicRanges objects. It's indeed important that is.unsorted() be
as fast and as memory efficient as possible since it is typically
used as a quick/cheap way of checking whether a costly sort is
required or not (e.g. with something like if (is.unsorted(x))
x <- sort(x)).

So it seems that unfortunately we won't be able to do it without
writing some C code. Your proposal sounds very reasonable to me. It
will perform in linear time (in the worst case) and avoid any copy
of the object (that we get with the expensive calls to head() and
tail() in my solution). So will be much faster than the 2 R solutions
whatever n is. Should work on GenomicRanges objects, not just GRanges
(this is easily achieved by passing S4Vectors:::decodeRle(seqnames(x)),
start(x), with(x), and S4Vectors:::decodeRle(strand(x)) to the .Call
entry point).

I'll take your patch if you want to work on this or I can add this
to GenomicRanges, let me know. We should probably take this off-list.

Thanks,
H.
On 11/02/2015 09:43 PM, Peter Hickey wrote: