Hi Herv?
Thanks for fixing this problem. It works fine now, however I did
notice a drop in performance. For example reading in chromosome 1 can
take about twice as long as previously (see below). Is this something
that can be avoided?
Many thanks again
Leonard
--
GenomicAlignments 0.99.8
param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
system.time(readGAlignmentPairs(file, param = param))
user system elapsed
161.282 7.812 169.442
system.time(readGAlignmentPairs(file, param = param))
user system elapsed
91.614 7.256 99.065
system.time(readGAlignmentPairs(file, param = param))
user system elapsed
89.285 7.461 96.940
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_0.99.8 Rsamtools_1.15.15 Biostrings_2.31.5
[4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.17
[7] BiocGenerics_0.9.2
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0
GenomicAlignments 0.99.10
param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
system.time(readGAlignmentPairs(file, param = param))
user system elapsed
265.624 8.812 274.990
system.time(readGAlignmentPairs(file, param = param))
user system elapsed
249.724 7.177 257.399
system.time(readGAlignmentPairs(file, param = param))
user system elapsed
247.476 6.648 254.621
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_0.99.10 Rsamtools_1.15.15
[3] Biostrings_2.31.5 GenomicRanges_1.15.17
[5] XVector_0.3.5 IRanges_1.21.17
[7] BiocGenerics_0.9.2
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0
On Thu, Dec 19, 2013 at 11:52 AM, Herv? Pag?s <hpages at fhcrc.org> wrote:
Hi Leonard,
This should be fixed in GenomicAlignments 0.99.10, which will become
available thru biocLite() in the next 24 hours or so. In the mean time
you can grab it directly from svn:
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicAlignments
Please let me know if you still run into problems with this.
Thanks!
H.
On 12/19/2013 10:41 AM, Leonard Goldstein wrote:
Hi Herv?
You probably spotted this already but it looks like the problem is
introduced between GenomicAlignments revisions r84052 (0.99.8) and
r84106 (0.99.9)
Best wishes
Leonard
On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein <goldstel at gene.com>
wrote:
Dear list,
There seems to be a problem with the readGAlignmentPairs function:
Querying genomic regions without any alignments using the which
argument results in an error -- see (1) below. Reading in a whole
chromosome takes indefinitely (or at least much longer than in
previous versions) -- see (2) below. I suspect these issues are not
specific to the BAM files I am working with but can provide test data
if required.
Many thanks for your help.
Leonard
--
(1) Attempts to read an empty region results in an error.
gr <- GRanges("22", IRanges(1000000, 2000000))
param <- ScanBamParam(which = gr)
readGAlignmentPairs(file, param = param)
Error in `elementMetadata<-`(x, ..., value = value) :
replacement 'elementMetadata' value must be a DataTable object or NULL
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_0.99.9 Rsamtools_1.15.15 Biostrings_2.31.5
[4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.16
[7] BiocGenerics_0.9.2 BiocInstaller_1.13.3
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.1.0 tools_3.1.0 zlibbioc_1.9.0
... but works fine with previous version
gr <- GRanges("22", IRanges(1000000, 2000000))
param <- ScanBamParam(which = gr)
readGAlignmentPairs(file, param = param)
GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
---
seqlengths:
1 2 3 ... GL000247.1 GL000248.1 GL000249.1
249250621 243199373 198022430 ... 36422 39786 38502
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4
[4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.0.0 zlibbioc_1.8.0
(2) Use of the which argument covering chromosome 22 takes under one
minute with an earlier version of readGAlignmentPairs
gr <- GRanges("22", IRanges(1, 51304566))
param <- ScanBamParam(which = gr)
system.time(gap <- readGAlignmentPairs(file, param = param))
user system elapsed
45.887 0.256 46.168
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4
[4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.0.0 zlibbioc_1.8.0
... but at least twenty times as long with the current version.
gr <- GRanges("22", IRanges(1, 51304566))
param <- ScanBamParam(which = gr)
system.time(gap <- readGAlignmentPairs(file, param = param))
^C
Timing stopped at: 1108.041 35.998 1144.006
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_0.99.9 Rsamtools_1.15.15 Biostrings_2.31.5
[4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.16
[7] BiocGenerics_0.9.2
loaded via a namespace (and not attached):
[1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0