[Bioc-devel] How to speed up GRange comparision
poverlaps()?
On Wed, Jan 29, 2020 at 7:50 AM web working <webworking at posteo.de> wrote:
Hello,
I have two big GRanges objects and want to search for an overlap of the
first range of query with the first range of subject. Then take the
second range of query and compare it with the second range of subject
and so on. Here an example of my problem:
# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
21)), id=1:4)
# The 2 overlaps at the first position should not be counted, because
these ranges are at different rows.
countOverlaps(query, subject)
# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)
# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out
# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
Maybe someone has an idea to speed this up?
Best,
Tobias
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Michael Lawrence Senior Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube