[Bioc-devel] [devteam-bioc] Very slow when operate GRangesList
Thanks Jianhong for reporting this.
Changes implemented in IRanges 1.19.27:
- RleList() constructor now has default 'compress=TRUE'.
- seqselect,Vector-method lapply() loop was replaced with direct subset.
New timings:
## generic subset function
fun0 <- function(x) x[500:1]
## GRangesList with RleList as metadata col
grll <- GRanges(seqnames="chr1",
IRanges(start=1:500, width=2),
someInfo=rep(RleList("*"), 500))
grr <- split(grll, 1:500)
> microbenchmark(fun0(grr), times=10)
Unit: milliseconds
expr min lq median uq max neval
fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367 10
Median is now 0.031 seconds compared to the previous 1.635.
> system.time(grr<- grr[500:1])
user system elapsed
1.622 0.013 1.635
Valerie
On 08/23/2013 11:17 AM, Michael Lawrence wrote:
On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain <vobencha at fhcrc.org
<mailto:vobencha at fhcrc.org>> wrote:
Hi Michael,
Martin and I have been discussing this. In addition to the fix you
suggest, what do you think of changing the default to
compressed=TRUE for the RleList constructor? Rle is the only one of
the AtomicLists with default FALSE. Was there a reason for this when
it was first implemented?
I'm guessing Patrick did that because we always used Rles for coverage,
and RleList for per-chromosome coverage. Also, there might be some
overhead in that Rle runs in the unlistData can cross list elements.
About my fix, the only downside would be if the range widths were much
larger than the size of the vector, e.g., a highly compressed Rle,
selected with chromosome-size ranges. Then the as.integer(ir) is big
compared to the data. Otherwise, it's way faster.
Val
On 08/22/2013 07:34 PM, Maintainer wrote:
Hi,
SimpleLists are slow in this situation, basically because the
underlying
seqselect is slow, due to this loop:
x <- do.call(c, lapply(seq_len(length(ir)),
function(i)
window(x,
start = start(ir)[i], width = width(ir)[i])))
Am I missing something or could this become a simple
x[as.integer(ir)]?
In the meantime, using CompressedLists is the way to go. So for an
RleList, you need to pass compress=TRUE to the constructor.
On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
<Jianhong.Ou at umassmed.edu <mailto:Jianhong.Ou at umassmed.edu>
<mailto:Jianhong.Ou at umassmed.__edu
<mailto:Jianhong.Ou at umassmed.edu>>> wrote:
Hi,
When I use big set of GrangesList, I found it become very
slow when
metadata contain AtomicList. e.g.
> grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
width=2), someInfo=rep(RleList("*"), 500))
> grr <- split(grll, 1:500)
> grl <- as.list(grr)
> system.time(grl<- grl[500:1])
user system elapsed
0 0 0
> system.time(grr<- grr[500:1])
user system elapsed
1.622 0.013 1.635
> grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
width=2))
> grr <- split(grll, 1:500)
> grl <- as.list(grr)
> system.time(grl<- grl[500:1])
user system elapsed
0 0 0
> system.time(grr<- grr[500:1])
user system elapsed
0.029 0.001 0.030
> sessionInfo()
R Under development (unstable) (2013-07-23 r63392)
Platform: x86_64-apple-darwin12.4.0 (64-bit)
locale:
[1]
en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] GenomicRanges_1.13.36 XVector_0.1.0 IRanges_1.19.24
BiocGenerics_0.7.3
loaded via a namespace (and not attached):
[1] stats4_3.1.0 tools_3.1.0
Is there any method to improve this?
Yours sincerely,
Jianhong Ou
LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
____________________________________________________________________________
devteam-bioc mailing list
To unsubscribe from this mailing list send a blank email to
devteam-bioc-leave at lists.__fhcrc.org
<mailto:devteam-bioc-leave at lists.fhcrc.org>
You can also unsubscribe or change your personal options at
https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
<https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>