[Bioc-devel] very slow to use intronsByTranscript in GenomicFeatures
hi,
i can reproduce what Jianhong says, i noticed it earlier this week but
didn't mention because we all know devel is a moving target and so on,
but since this has been raised now i'll report what i'm getting.
so, this is for Jianhong, if you downgrade the following packages to
these particular versions:
Biostrings_2.31.3.tar.gz
GenomicRanges_1.15.15.tar.gz
IRanges_1.21.13.tar.gz
XVector_0.3.2.tar.gz
you'll be all fine, unless you need some functionality of later versions
of them, here is the test with the session information:
suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene))
Warning messages:
1: multiple methods tables found for ?rname?
2: multiple methods tables found for ?rname<-?
3: multiple methods tables found for ?cigar?
4: multiple methods tables found for ?qwidth?
5: multiple methods tables found for ?introns?
system.time(txbygene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,
"gene"))
user system elapsed
2.524 0.046 2.575
sessionInfo()
R Under development (unstable) (2013-10-20 r64082)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
[2] GenomicFeatures_1.15.4
[3] AnnotationDbi_1.25.9
[4] Biobase_2.23.3
[5] GenomicRanges_1.15.11
[6] XVector_0.3.2
[7] IRanges_1.21.13
[8] BiocGenerics_0.9.2
[9] vimcom_0.9-92
[10] setwidth_1.0-3
[11] colorout_1.0-1
loaded via a namespace (and not attached):
[1] biomaRt_2.19.1 Biostrings_2.31.3 bitops_1.0-6
[4] BSgenome_1.31.7 DBI_0.2-7
GenomicAlignments_0.99.9
[7] RCurl_1.95-4.1 Rsamtools_1.15.15 RSQLite_0.11.4
[10] rtracklayer_1.23.6 stats4_3.1.0 tools_3.1.0
[13] XML_3.98-1.1 zlibbioc_1.9.0
however, if you go to the bleeding edge of devel BioC:
suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene))
system.time(txbygene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,
"gene"))
the previous call never ends until you press CTRL+C:
^C
Error in unlist(lapply(c("seqnames", "ranges", "strand", "mcols"),
checkCoreGetterReturnedLength)) :
error in evaluating the argument 'x' in selecting a method for
function 'unlist': Error in NROW(get(getter)(x)) :
error in evaluating the argument 'x' in selecting a method for
function 'NROW': Error in get(getter)(x) :
error in evaluating the argument 'x' in selecting a method for
function 'ranges':
Timing stopped at: 24.5 0.072 24.619
sessionInfo()
R Under development (unstable) (2013-10-20 r64082)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
[3] AnnotationDbi_1.25.9 Biobase_2.23.3
[5] GenomicRanges_1.15.15 XVector_0.3.5
[7] IRanges_1.21.17 BiocGenerics_0.9.2
[9] vimcom_0.9-92 setwidth_1.0-3
[11] colorout_1.0-1
loaded via a namespace (and not attached):
[1] biomaRt_2.19.1 Biostrings_2.31.5 bitops_1.0-6
BSgenome_1.31.7
[5] DBI_0.2-7 GenomicAlignments_0.99.9 RCurl_1.95-4.1
Rsamtools_1.15.15
[9] RSQLite_0.11.4 rtracklayer_1.23.6 stats4_3.1.0
tools_3.1.0
[13] XML_3.98-1.1 zlibbioc_1.9.0
cheers,
robert.
On 12/20/2013 06:31 PM, Ou, Jianhong wrote:
In my case, looks like never end. I need to check my R first. Yours sincerely, Jianhong Ou LRB 670A Program in Gene Function and Expression 364 Plantation Street Worcester, MA 01605 On 12/20/13 12:05 PM, "Herv? Pag?s"<hpages at fhcrc.org> wrote:
Hi Jianhong, According to my timings, it's a little bit slower than exonsBy() but not that much. It has to do a little bit more work too as the introns are not explicitly stored in the SQLite db (the exons are) but are inferred from the exons and transcript boundaries. So intronsByTranscript() has to retrieve all the exons + all the transcripts from the db. intronsByTranscript(): library(TxDb.Hsapiens.UCSC.hg19.knownGene) system.time(introns<- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)) # user system elapsed # 9.165 0.076 9.263 system.time(introns<- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)) # user system elapsed # 4.824 0.064 4.896 exonsBy(): library(TxDb.Hsapiens.UCSC.hg19.knownGene) system.time(exons<- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)) # user system elapsed # 7.720 0.072 7.812 system.time(exons<- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)) # user system elapsed # 4.229 0.028 4.265 transcripts(): library(TxDb.Hsapiens.UCSC.hg19.knownGene) system.time(tx<- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)) # user system elapsed # 1.424 0.008 1.436 system.time(tx<- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)) # user system elapsed # 0.776 0.012 0.790 Less than 10 sec. to retrieve all the exons and transcripts from disk and compute the 659327 introns. It's actually not that bad. Cheers, H. On 12/20/2013 08:25 AM, Ou, Jianhong wrote:
Dear all, When I try to use intronsByTranscript to get introns for hg19 known genes, I found it is unacceptable slow. Does any body has the same problem? My code: library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) introns<- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)
sessionInfo()
R Under development (unstable) (2013-12-12 r64453)
Platform: x86_64-apple-darwin12.5.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
[3] AnnotationDbi_1.25.9 Biobase_2.23.3
[5] GenomicRanges_1.15.15 XVector_0.3.5
[7] IRanges_1.21.17 BiocGenerics_0.9.2
loaded via a namespace (and not attached):
[1] biomaRt_2.19.1 Biostrings_2.31.5 bitops_1.0-6
BSgenome_1.31.7
[5] DBI_0.2-7 GenomicAlignments_0.99.9 RCurl_1.95-4.1
Rsamtools_1.15.15
[9] RSQLite_0.11.4 rtracklayer_1.23.6 stats4_3.1.0
tools_3.1.0
[13] XML_3.98-1.1 zlibbioc_1.9.0
Yours sincerely,
Jianhong Ou
LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550