[Bioc-devel] GRanges Unique [actually -- `order`] Method
On 06/15/2011 04:26 AM, Michael Lawrence wrote:
Thanks for looking into this Steve. Maybe I am missing something here, but why not just do something like: order(as.factor(seqnames(gr)), as.factor(strand(gr)), start(gr)) I think we'd want an option for including strand or not.
like nearest,GenomicRanges,GenomicRanges, which has ignore.strand=FALSE. For GRangesList maybe an easy approach is to add a first argument order(rep(seq_along(grl), elementLengths(grl)), ...) then unlist, order, and re-list. Also not a fan of allowing the user to specify seqnames.order; you can't do this for factors, and sounds really like the user wants seqlevels(gr) <- ... Martin
Thanks again, Michael On Tue, Jun 14, 2011 at 10:17 PM, Steve Lianoglou< mailinglist.honeypot at gmail.com> wrote:
I took another crack at my original attempt and reduced a call to my GenomicRanges::order from ~ 22 seconds to ~ 5.5 seconds over 1 million randomly picked ranges over hsapiens. Still not super fast, but not as abysmal as before. I'll put it here for review before checking in (or not): https://gist.github.com/1026520 Thanks, -steve On Tue, Jun 14, 2011 at 8:06 PM, Steve Lianoglou <mailinglist.honeypot at gmail.com> wrote:
Hi, (Digging up an old [related] thread since I'm not sure of the status of the code that Michael referred to in this context is ...) I have a suboptimal-but-working implementation of `order` (and by extension, `sort`) for GenomicRanges objects, eg. it calculates the `order`ing of a GRanges object of length 1 million (randomly spread across all Hsapiens chromosomes and strands) in ~ 22 seconds[*]. The resulting/ordered ranges are sorted/grouped by seqnames,strand,ranges (the caller can specify the ordering of the seqnames, otherwise the ordering as defined by seqleves(your.granges.object) is used. Also it is only defined for one GRanges object (not sure what the appropriate result would be if multiple granges objects are passed in) I can check it into SVN if that sounds good so it can work as a stop-gap until one of the *Ranges-guru's can whip up a superior one. [*] By the by, the runtime is dominated by iterating over the seqnames and subselecting the appropriate ranges to work for one at a time ... maybe the speed can be increased by using `split` a few times, but then you have several copies of your GRanges object in memory, so ... not sure what's best atm or how useful it is to talk about code in the "abstract," but we can continue the discussion if you reckon it's worthy to be checked in for now ... -steve On Wed, May 25, 2011 at 9:02 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
Someone has to write the methods... On Tue, May 24, 2011 at 11:00 PM, Dario Strbenac <D.Strbenac at garvan.org.au>wrote:
Yes, the sort method just calls order.
Something isn't quite working out for me.
library(GenomicRanges) # 1.4.5
gr<- GRanges("chr1", IRanges(c(1, 10), c(50, 60)), '+')
sort(gr)
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
-- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793