Back to formatted view
Raw Message

Message-ID: <CAOQ5NyeTh+K9Ohy6uqkDSy7Kj5nyvDRPV5J_tYJ8vSX-pF-qtQ@mail.gmail.com>
Date: 2020-01-29T16:01:10Z
From: Michael Lawrence
Subject: [Bioc-devel] How to speed up GRange comparision
In-Reply-To: <07a823dd-8a5a-396a-6ba3-335d5037c29f@posteo.de>

poverlaps()?

On Wed, Jan 29, 2020 at 7:50 AM web working <webworking at posteo.de> wrote:
>
> Hello,
>
> I have two big GRanges objects and want to search for an overlap of  the
> first range of query with the first range of subject. Then take the
> second range of query and compare it with the second range of subject
> and so on. Here an example of my problem:
>
> # GRanges objects
> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
> 22)), id=1:4)
> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
> 21)), id=1:4)
>
> # The 2 overlaps at the first position should not be counted, because
> these ranges are at different rows.
> countOverlaps(query, subject)
>
> # Approach 1 (bad style. I have simplified it to understand)
> dat <- as.data.frame(findOverlaps(query, subject))
> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
> indexBool <- dat[indexDat,1]
> out <- rep(FALSE, length(query))
> out[indexBool] <- TRUE
> as.numeric(out)
>
> # Approach 2 (bad style and takes too long)
> out <- vector("numeric", 4)
> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
> out
>
> # Approach 3 (wrong results)
> as.numeric(overlapsAny(query, subject))
> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>
>
> Maybe someone has an idea to speed this up?
>
>
> Best,
>
> Tobias
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube