[Bioc-devel] AnnotationDbi and select function

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20140312/5bee85da/attachment.pl>
Hi Nicolas,
Dear all,

I have an error using the select function from the AnnotationDbi package.
I try to convert some geneID into Symbol, but for some strange reasons it crashed.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
isActiveSeq(txdb)[seqlevels(txdb)] <- FALSE
isActiveSeq(txdb)[c("chr16","chr1")] <- TRUE
geneGR <- exonsBy(txdb, "gene")
library(Homo.sapiens)
symbol <- select(Homo.sapiens, keys = names(geneGR), keytype = "GENEID", columns = "SYMBOL")
Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype = "GENEID",  :
   erreur d'?valuation de l'argument 'x' lors de la s?lection d'une m?thode pour la fonction 'head' : Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :

length(geneGR)
[1] 3269
## The first 1K work
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype = "GENEID", columns = "SYMBOL")
## The 1K+1 does not !
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype = "GENEID", columns = "SYMBOL")
Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   nombre de dimensions incorrect

It looks like I cannot convert more than 1K elements ?? Any reason for that ?
Thank you very much
Nicolas
Not sure what 'GENEID' is in this context - it appears to be Entrez 
Gene. But anyway, if you use "ENTREZID" instead, it works fine:

 > symbol <- select(Homo.sapiens, names(geneGR), "SYMBOL", "ENTREZID")
 > symbol <- select(Homo.sapiens, names(geneGR), "GENEID", "ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions
 > symbol <- select(Homo.sapiens, names(geneGR)[1:1000], "GENEID", 
"ENTREZID")
 > symbol <- select(Homo.sapiens, names(geneGR)[1:1001], "GENEID", 
"ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions

Best,

Jim

sessionInfo()
R Under development (unstable) (2014-03-05 r65125)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] Homo.sapiens_1.1.2
  [2] org.Hs.eg.db_2.10.1
  [3] GO.db_2.10.1
  [4] RSQLite_0.11.4
  [5] DBI_0.2-7
  [6] OrganismDbi_1.5.3
  [7] XVector_0.3.7
  [8] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
  [9] GenomicFeatures_1.15.9
[10] AnnotationDbi_1.25.14
[11] GenomeInfoDb_0.99.17
[12] Biobase_2.23.6
[13] GenomicRanges_1.15.32
[14] IRanges_1.21.32
[15] BiocGenerics_0.9.3
[16] RColorBrewer_1.0-5
[17] reshape2_1.2.2
[18] reshape_0.8.4
[19] plyr_1.8.1
[20] ggplot2_0.9.3.1
[21] Matrix_1.1-2-2

loaded via a namespace (and not attached):
  [1] BatchJobs_1.2             BBmisc_1.5
  [3] BiocParallel_0.5.16       biomaRt_2.19.3
  [5] Biostrings_2.31.14        bitops_1.0-6
  [7] brew_1.0-6                BSgenome_1.31.12
  [9] codetools_0.2-8           colorspace_1.2-4
[11] dichromat_2.0-0           digest_0.6.4
[13] fail_1.2                  foreach_1.4.1
[15] GenomicAlignments_0.99.29 graph_1.41.3
[17] grid_3.1.0                gtable_0.1.2
[19] iterators_1.0.6           labeling_0.2
[21] lattice_0.20-27           MASS_7.3-29
[23] munsell_0.4.2             proto_0.3-10
[25] RBGL_1.39.2               Rcpp_0.11.0
[27] RCurl_1.95-4.1            Rsamtools_1.15.32
[29] rtracklayer_1.23.15       scales_0.2.3
[31] sendmailR_1.1-2           stats4_3.1.0
[33] stringr_0.6.2             tools_3.1.0
[35] XML_3.98-1.1              zlibbioc_1.9.0

	[[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Thanks Nicolaus!  That's a good bug.  I will work on a fix.  The reason 
why James work-around here functions is because the number of databases 
that it has to query is fewer by one.  It is also faster for this 
reason.  So when you say GENEID you mean the ids used in the associated 
txdb database which means that these have to be checked against that DB 
(and anything related to it extracted) and then merged with the results 
of the symbol information by joining on the foreign key for these two 
DBs.  So thats actually much more complex than just extracting all the 
same data from just the org package even though the end result (in this 
case) is the same.  The bug is probably happening in the associated 
merge step.

  Marc
Hi Nicolas,

On 3/12/2014 12:39 PM, Servant Nicolas wrote:
Dear all,

I have an error using the select function from the AnnotationDbi 
package.
I try to convert some geneID into Symbol, but for some strange 
reasons it crashed.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
isActiveSeq(txdb)[seqlevels(txdb)] <- FALSE
isActiveSeq(txdb)[c("chr16","chr1")] <- TRUE
geneGR <- exonsBy(txdb, "gene")
library(Homo.sapiens)
symbol <- select(Homo.sapiens, keys = names(geneGR), keytype = 
"GENEID", columns = "SYMBOL")
Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001], 
keytype = "GENEID",  :
   erreur d'?valuation de l'argument 'x' lors de la s?lection d'une 
m?thode pour la fonction 'head' : Erreur dans res[, 
.reverseColAbbreviations(x, cnames), drop = FALSE] :

length(geneGR)
[1] 3269
## The first 1K work
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype 
= "GENEID", columns = "SYMBOL")
## The 1K+1 does not !
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype 
= "GENEID", columns = "SYMBOL")
Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   nombre de dimensions incorrect

It looks like I cannot convert more than 1K elements ?? Any reason 
for that ?
Thank you very much
Nicolas
Not sure what 'GENEID' is in this context - it appears to be Entrez 
Gene. But anyway, if you use "ENTREZID" instead, it works fine:

symbol <- select(Homo.sapiens, names(geneGR), "SYMBOL", "ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR), "GENEID", "ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
  incorrect number of dimensions
symbol <- select(Homo.sapiens, names(geneGR)[1:1000], "GENEID", 
"ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR)[1:1001], "GENEID", 
"ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
  incorrect number of dimensions

Best,

Jim

sessionInfo()
R Under development (unstable) (2014-03-05 r65125)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
  [1] Homo.sapiens_1.1.2
  [2] org.Hs.eg.db_2.10.1
  [3] GO.db_2.10.1
  [4] RSQLite_0.11.4
  [5] DBI_0.2-7
  [6] OrganismDbi_1.5.3
  [7] XVector_0.3.7
  [8] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
  [9] GenomicFeatures_1.15.9
[10] AnnotationDbi_1.25.14
[11] GenomeInfoDb_0.99.17
[12] Biobase_2.23.6
[13] GenomicRanges_1.15.32
[14] IRanges_1.21.32
[15] BiocGenerics_0.9.3
[16] RColorBrewer_1.0-5
[17] reshape2_1.2.2
[18] reshape_0.8.4
[19] plyr_1.8.1
[20] ggplot2_0.9.3.1
[21] Matrix_1.1-2-2

loaded via a namespace (and not attached):
  [1] BatchJobs_1.2             BBmisc_1.5
  [3] BiocParallel_0.5.16       biomaRt_2.19.3
  [5] Biostrings_2.31.14        bitops_1.0-6
  [7] brew_1.0-6                BSgenome_1.31.12
  [9] codetools_0.2-8           colorspace_1.2-4
[11] dichromat_2.0-0           digest_0.6.4
[13] fail_1.2                  foreach_1.4.1
[15] GenomicAlignments_0.99.29 graph_1.41.3
[17] grid_3.1.0                gtable_0.1.2
[19] iterators_1.0.6           labeling_0.2
[21] lattice_0.20-27           MASS_7.3-29
[23] munsell_0.4.2             proto_0.3-10
[25] RBGL_1.39.2               Rcpp_0.11.0
[27] RCurl_1.95-4.1            Rsamtools_1.15.32
[29] rtracklayer_1.23.15       scales_0.2.3
[31] sendmailR_1.1-2           stats4_3.1.0
[33] stringr_0.6.2             tools_3.1.0
[35] XML_3.98-1.1              zlibbioc_1.9.0

    [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hi guys,

Thanks for your feedbacks.
Indeed I put GENEID because it is used in the txdb database.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
columns(txdb)
[1] "CDSID"      "CDSNAME"    "CDSCHROM"   "CDSSTRAND"  "CDSSTART" 
 [6] "CDSEND"     "EXONID"     "EXONNAME"   "EXONCHROM"  "EXONSTRAND"
[11] "EXONSTART"  "EXONEND"    "GENEID"     "TXID"       "EXONRANK" 
[16] "TXNAME"     "TXCHROM"    "TXSTRAND"   "TXSTART"    "TXEND"    

I will move to ENTREZID which is much faster ! 
I'm glad It could help
Nicolas

________________________________________
De : bioc-devel-bounces at r-project.org [bioc-devel-bounces at r-project.org] de la part de Marc Carlson [mcarlson at fhcrc.org]
Date d'envoi : mercredi 12 mars 2014 20:18
? : bioc-devel at r-project.org
Objet : Re: [Bioc-devel] AnnotationDbi and select function

Thanks Nicolaus!  That's a good bug.  I will work on a fix.  The reason
why James work-around here functions is because the number of databases
that it has to query is fewer by one.  It is also faster for this
reason.  So when you say GENEID you mean the ids used in the associated
txdb database which means that these have to be checked against that DB
(and anything related to it extracted) and then merged with the results
of the symbol information by joining on the foreign key for these two
DBs.  So thats actually much more complex than just extracting all the
same data from just the org package even though the end result (in this
case) is the same.  The bug is probably happening in the associated
merge step.

  Marc
Hi Nicolas,

On 3/12/2014 12:39 PM, Servant Nicolas wrote:
Dear all,

I have an error using the select function from the AnnotationDbi
package.
I try to convert some geneID into Symbol, but for some strange
reasons it crashed.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
isActiveSeq(txdb)[seqlevels(txdb)] <- FALSE
isActiveSeq(txdb)[c("chr16","chr1")] <- TRUE
geneGR <- exonsBy(txdb, "gene")
library(Homo.sapiens)
symbol <- select(Homo.sapiens, keys = names(geneGR), keytype =
"GENEID", columns = "SYMBOL")
Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001],
keytype = "GENEID",  :
   erreur d'?valuation de l'argument 'x' lors de la s?lection d'une
m?thode pour la fonction 'head' : Erreur dans res[,
.reverseColAbbreviations(x, cnames), drop = FALSE] :

length(geneGR)
[1] 3269
## The first 1K work
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype
= "GENEID", columns = "SYMBOL")
## The 1K+1 does not !
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype
= "GENEID", columns = "SYMBOL")
Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   nombre de dimensions incorrect

It looks like I cannot convert more than 1K elements ?? Any reason
for that ?
Thank you very much
Nicolas
Not sure what 'GENEID' is in this context - it appears to be Entrez
Gene. But anyway, if you use "ENTREZID" instead, it works fine:

symbol <- select(Homo.sapiens, names(geneGR), "SYMBOL", "ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR), "GENEID", "ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
  incorrect number of dimensions
symbol <- select(Homo.sapiens, names(geneGR)[1:1000], "GENEID",
"ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR)[1:1001], "GENEID",
"ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
  incorrect number of dimensions

Best,

Jim

sessionInfo()
R Under development (unstable) (2014-03-05 r65125)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
  [1] Homo.sapiens_1.1.2
  [2] org.Hs.eg.db_2.10.1
  [3] GO.db_2.10.1
  [4] RSQLite_0.11.4
  [5] DBI_0.2-7
  [6] OrganismDbi_1.5.3
  [7] XVector_0.3.7
  [8] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
  [9] GenomicFeatures_1.15.9
[10] AnnotationDbi_1.25.14
[11] GenomeInfoDb_0.99.17
[12] Biobase_2.23.6
[13] GenomicRanges_1.15.32
[14] IRanges_1.21.32
[15] BiocGenerics_0.9.3
[16] RColorBrewer_1.0-5
[17] reshape2_1.2.2
[18] reshape_0.8.4
[19] plyr_1.8.1
[20] ggplot2_0.9.3.1
[21] Matrix_1.1-2-2

loaded via a namespace (and not attached):
  [1] BatchJobs_1.2             BBmisc_1.5
  [3] BiocParallel_0.5.16       biomaRt_2.19.3
  [5] Biostrings_2.31.14        bitops_1.0-6
  [7] brew_1.0-6                BSgenome_1.31.12
  [9] codetools_0.2-8           colorspace_1.2-4
[11] dichromat_2.0-0           digest_0.6.4
[13] fail_1.2                  foreach_1.4.1
[15] GenomicAlignments_0.99.29 graph_1.41.3
[17] grid_3.1.0                gtable_0.1.2
[19] iterators_1.0.6           labeling_0.2
[21] lattice_0.20-27           MASS_7.3-29
[23] munsell_0.4.2             proto_0.3-10
[25] RBGL_1.39.2               Rcpp_0.11.0
[27] RCurl_1.95-4.1            Rsamtools_1.15.32
[29] rtracklayer_1.23.15       scales_0.2.3
[31] sendmailR_1.1-2           stats4_3.1.0
[33] stringr_0.6.2             tools_3.1.0
[35] XML_3.98-1.1              zlibbioc_1.9.0

    [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
I just checked a fix in for this bug to GenomicFeatures (which happens 
to be where the problem was).  It should percolate out to the build 
system soon.

  Marc
Hi guys,

Thanks for your feedbacks.
Indeed I put GENEID because it is used in the txdb database.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
columns(txdb)
  [1] "CDSID"      "CDSNAME"    "CDSCHROM"   "CDSSTRAND"  "CDSSTART"
  [6] "CDSEND"     "EXONID"     "EXONNAME"   "EXONCHROM"  "EXONSTRAND"
[11] "EXONSTART"  "EXONEND"    "GENEID"     "TXID"       "EXONRANK"
[16] "TXNAME"     "TXCHROM"    "TXSTRAND"   "TXSTART"    "TXEND"

I will move to ENTREZID which is much faster !
I'm glad It could help
Nicolas

________________________________________
De : bioc-devel-bounces at r-project.org [bioc-devel-bounces at r-project.org] de la part de Marc Carlson [mcarlson at fhcrc.org]
Date d'envoi : mercredi 12 mars 2014 20:18
? : bioc-devel at r-project.org
Objet : Re: [Bioc-devel] AnnotationDbi and select function

Thanks Nicolaus!  That's a good bug.  I will work on a fix.  The reason
why James work-around here functions is because the number of databases
that it has to query is fewer by one.  It is also faster for this
reason.  So when you say GENEID you mean the ids used in the associated
txdb database which means that these have to be checked against that DB
(and anything related to it extracted) and then merged with the results
of the symbol information by joining on the foreign key for these two
DBs.  So thats actually much more complex than just extracting all the
same data from just the org package even though the end result (in this
case) is the same.  The bug is probably happening in the associated
merge step.

   Marc

On 03/12/2014 10:06 AM, James W. MacDonald wrote:
Hi Nicolas,

On 3/12/2014 12:39 PM, Servant Nicolas wrote:
Dear all,

I have an error using the select function from the AnnotationDbi
package.
I try to convert some geneID into Symbol, but for some strange
reasons it crashed.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
isActiveSeq(txdb)[seqlevels(txdb)] <- FALSE
isActiveSeq(txdb)[c("chr16","chr1")] <- TRUE
geneGR <- exonsBy(txdb, "gene")
library(Homo.sapiens)
symbol <- select(Homo.sapiens, keys = names(geneGR), keytype =
"GENEID", columns = "SYMBOL")
Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001],
keytype = "GENEID",  :
    erreur d'?valuation de l'argument 'x' lors de la s?lection d'une
m?thode pour la fonction 'head' : Erreur dans res[,
.reverseColAbbreviations(x, cnames), drop = FALSE] :

length(geneGR)
[1] 3269
## The first 1K work
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype
= "GENEID", columns = "SYMBOL")
## The 1K+1 does not !
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype
= "GENEID", columns = "SYMBOL")
Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
    nombre de dimensions incorrect

It looks like I cannot convert more than 1K elements ?? Any reason
for that ?
Thank you very much
Nicolas
Not sure what 'GENEID' is in this context - it appears to be Entrez
Gene. But anyway, if you use "ENTREZID" instead, it works fine:

symbol <- select(Homo.sapiens, names(geneGR), "SYMBOL", "ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR), "GENEID", "ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions
symbol <- select(Homo.sapiens, names(geneGR)[1:1000], "GENEID",
"ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR)[1:1001], "GENEID",
"ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions

Best,

Jim

sessionInfo()
R Under development (unstable) (2014-03-05 r65125)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
   [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
   [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
   [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
   [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
   [1] Homo.sapiens_1.1.2
   [2] org.Hs.eg.db_2.10.1
   [3] GO.db_2.10.1
   [4] RSQLite_0.11.4
   [5] DBI_0.2-7
   [6] OrganismDbi_1.5.3
   [7] XVector_0.3.7
   [8] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
   [9] GenomicFeatures_1.15.9
[10] AnnotationDbi_1.25.14
[11] GenomeInfoDb_0.99.17
[12] Biobase_2.23.6
[13] GenomicRanges_1.15.32
[14] IRanges_1.21.32
[15] BiocGenerics_0.9.3
[16] RColorBrewer_1.0-5
[17] reshape2_1.2.2
[18] reshape_0.8.4
[19] plyr_1.8.1
[20] ggplot2_0.9.3.1
[21] Matrix_1.1-2-2

loaded via a namespace (and not attached):
   [1] BatchJobs_1.2             BBmisc_1.5
   [3] BiocParallel_0.5.16       biomaRt_2.19.3
   [5] Biostrings_2.31.14        bitops_1.0-6
   [7] brew_1.0-6                BSgenome_1.31.12
   [9] codetools_0.2-8           colorspace_1.2-4
[11] dichromat_2.0-0           digest_0.6.4
[13] fail_1.2                  foreach_1.4.1
[15] GenomicAlignments_0.99.29 graph_1.41.3
[17] grid_3.1.0                gtable_0.1.2
[19] iterators_1.0.6           labeling_0.2
[21] lattice_0.20-27           MASS_7.3-29
[23] munsell_0.4.2             proto_0.3-10
[25] RBGL_1.39.2               Rcpp_0.11.0
[27] RCurl_1.95-4.1            Rsamtools_1.15.32
[29] rtracklayer_1.23.15       scales_0.2.3
[31] sendmailR_1.1-2           stats4_3.1.0
[33] stringr_0.6.2             tools_3.1.0
[35] XML_3.98-1.1              zlibbioc_1.9.0

     [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
Also,

There is nothing wrong with using GENEID the way that you initially 
did.  It was just a small bug that prevented some internal subsetting 
from working properly and that is now fixed.

It just happened that GENEID was equivalent to ENTREZID in this case.  
And that ends up making it a slower choice just because the software has 
to do more work (in case GENEID is something else). So since you know 
that these are in fact ENTREZIDs, you can take Jims suggestion as a 
short cut and thus get a little performance boost.

But it's still a less specific thing to request than GENEID (which could 
potentially be another kind of ID).  So the two things (GENEID and 
ENTREZID) are not always the same kind of thing.  They just happened to 
both be ENTREZID in *this* case.  In a different scenario GENEID from 
the associated TranscriptDb might be something like an ensembl gene ID.  
And then to use a shortcut would mean using ENSEMBL instead of ENTREZID 
to do the shortcut...

In contrast: GENEID should normally always work (but it should also be a 
tiny bit slower).

Sorry if you know all this stuff, but I think its better to be explicit 
than to say too little.

   Marc
I just checked a fix in for this bug to GenomicFeatures (which happens 
to be where the problem was).  It should percolate out to the build 
system soon.

 Marc

On 03/12/2014 02:19 PM, Servant Nicolas wrote:
Hi guys,

Thanks for your feedbacks.
Indeed I put GENEID because it is used in the txdb database.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
columns(txdb)
  [1] "CDSID"      "CDSNAME"    "CDSCHROM"   "CDSSTRAND" "CDSSTART"
  [6] "CDSEND"     "EXONID"     "EXONNAME"   "EXONCHROM" "EXONSTRAND"
[11] "EXONSTART"  "EXONEND"    "GENEID"     "TXID" "EXONRANK"
[16] "TXNAME"     "TXCHROM"    "TXSTRAND"   "TXSTART"    "TXEND"

I will move to ENTREZID which is much faster !
I'm glad It could help
Nicolas

________________________________________
De : bioc-devel-bounces at r-project.org 
[bioc-devel-bounces at r-project.org] de la part de Marc Carlson 
[mcarlson at fhcrc.org]
Date d'envoi : mercredi 12 mars 2014 20:18
? : bioc-devel at r-project.org
Objet : Re: [Bioc-devel] AnnotationDbi and select function

Thanks Nicolaus!  That's a good bug.  I will work on a fix.  The reason
why James work-around here functions is because the number of databases
that it has to query is fewer by one.  It is also faster for this
reason.  So when you say GENEID you mean the ids used in the associated
txdb database which means that these have to be checked against that DB
(and anything related to it extracted) and then merged with the results
of the symbol information by joining on the foreign key for these two
DBs.  So thats actually much more complex than just extracting all the
same data from just the org package even though the end result (in this
case) is the same.  The bug is probably happening in the associated
merge step.

   Marc

On 03/12/2014 10:06 AM, James W. MacDonald wrote:
Hi Nicolas,

On 3/12/2014 12:39 PM, Servant Nicolas wrote:
Dear all,

I have an error using the select function from the AnnotationDbi
package.
I try to convert some geneID into Symbol, but for some strange
reasons it crashed.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
isActiveSeq(txdb)[seqlevels(txdb)] <- FALSE
isActiveSeq(txdb)[c("chr16","chr1")] <- TRUE
geneGR <- exonsBy(txdb, "gene")
library(Homo.sapiens)
symbol <- select(Homo.sapiens, keys = names(geneGR), keytype =
"GENEID", columns = "SYMBOL")
Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001],
keytype = "GENEID",  :
    erreur d'?valuation de l'argument 'x' lors de la s?lection d'une
m?thode pour la fonction 'head' : Erreur dans res[,
.reverseColAbbreviations(x, cnames), drop = FALSE] :

length(geneGR)
[1] 3269
## The first 1K work
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype
= "GENEID", columns = "SYMBOL")
## The 1K+1 does not !
symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype
= "GENEID", columns = "SYMBOL")
Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
    nombre de dimensions incorrect

It looks like I cannot convert more than 1K elements ?? Any reason
for that ?
Thank you very much
Nicolas
Not sure what 'GENEID' is in this context - it appears to be Entrez
Gene. But anyway, if you use "ENTREZID" instead, it works fine:

symbol <- select(Homo.sapiens, names(geneGR), "SYMBOL", "ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR), "GENEID", "ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions
symbol <- select(Homo.sapiens, names(geneGR)[1:1000], "GENEID",
"ENTREZID")
symbol <- select(Homo.sapiens, names(geneGR)[1:1001], "GENEID",
"ENTREZID")
Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
   incorrect number of dimensions

Best,

Jim

sessionInfo()
R Under development (unstable) (2014-03-05 r65125)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
   [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
   [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
   [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
   [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils datasets methods
[8] base

other attached packages:
   [1] Homo.sapiens_1.1.2
   [2] org.Hs.eg.db_2.10.1
   [3] GO.db_2.10.1
   [4] RSQLite_0.11.4
   [5] DBI_0.2-7
   [6] OrganismDbi_1.5.3
   [7] XVector_0.3.7
   [8] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
   [9] GenomicFeatures_1.15.9
[10] AnnotationDbi_1.25.14
[11] GenomeInfoDb_0.99.17
[12] Biobase_2.23.6
[13] GenomicRanges_1.15.32
[14] IRanges_1.21.32
[15] BiocGenerics_0.9.3
[16] RColorBrewer_1.0-5
[17] reshape2_1.2.2
[18] reshape_0.8.4
[19] plyr_1.8.1
[20] ggplot2_0.9.3.1
[21] Matrix_1.1-2-2

loaded via a namespace (and not attached):
   [1] BatchJobs_1.2             BBmisc_1.5
   [3] BiocParallel_0.5.16       biomaRt_2.19.3
   [5] Biostrings_2.31.14        bitops_1.0-6
   [7] brew_1.0-6                BSgenome_1.31.12
   [9] codetools_0.2-8           colorspace_1.2-4
[11] dichromat_2.0-0           digest_0.6.4
[13] fail_1.2                  foreach_1.4.1
[15] GenomicAlignments_0.99.29 graph_1.41.3
[17] grid_3.1.0                gtable_0.1.2
[19] iterators_1.0.6           labeling_0.2
[21] lattice_0.20-27           MASS_7.3-29
[23] munsell_0.4.2             proto_0.3-10
[25] RBGL_1.39.2               Rcpp_0.11.0
[27] RCurl_1.95-4.1            Rsamtools_1.15.32
[29] rtracklayer_1.23.15       scales_0.2.3
[31] sendmailR_1.1-2           stats4_3.1.0
[33] stringr_0.6.2             tools_3.1.0
[35] XML_3.98-1.1              zlibbioc_1.9.0

     [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel