[Bioc-devel] How to speed up GRange comparision

Thu, Jan 30, 2020 10:33 AM

On 1/29/20 23:31, web working wrote:

Why not just

   out <- abs(pcompare(query, subject)) < 5

In any case you should use integer instead of numeric (twice more 
compact in memory).

H.

Carey was right that this here is off list. Next time I will pose my 
question on support.bioconductor.org 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__support.bioconductor.org&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=VZ2-jg7W_Ctrav2BpPVUPpvJlyISX3QVwFAzTnDnNTs&s=Jdmp3dD6ubzPdE8KjFJ3urOav62YTOmxYcYZ4000MY8&e=>.

Best,

Tobias

Am 29.01.20 um 18:02 schrieb Pages, Herve:

Yes poverlaps().

Or pcompare(), which should be even faster. But only if you are not
afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
of the codes returned by pcompare().

H.

On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:

poverlaps()?

On Wed, Jan 29, 2020 at 7:50 AM web working<webworking at posteo.de>  wrote:

Hello,

I have two big GRanges objects and want to search for an overlap of  the
first range of query with the first range of subject. Then take the
second range of query and compare it with the second range of subject
and so on. Here an example of my problem:

# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
21)), id=1:4)

# The 2 overlaps at the first position should not be counted, because
these ranges are at different rows.
countOverlaps(query, subject)

# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)

# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out

# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


Maybe someone has an idea to speed this up?


Best,

Tobias

_______________________________________________
Bioc-devel at r-project.org  mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=

Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

[Bioc-devel] How to speed up GRange comparision

Thread (14 messages)