Skip to content
Prev 7627 / 21312 Next

[Bioc-devel] Changes in AnnotationDbi

On 06/09/2015 02:52 AM, Simon Anders wrote:
In case  you missed it in Marc's reply, and acknowledging that this is different 
from your suggestion, there is mapIds() for doing this on a single column basis, 
which is the common use case where one doesn't care too much about multiple 
mapping ids

 > org = org.Hs.eg.db
 > head(select(org, keys(org), "ALIAS"))
   ENTREZID    ALIAS
1        1      A1B
2        1      ABG
3        1      GAB
4        1 HYST2477
5        1     A1BG
6        2     A2MD
 > head(mapIds(org, keys(org), "ALIAS", "ENTREZID"))
      1      2      3      9     10     11
  "A1B" "A2MD" "A2MP" "AAC1" "AAC2" "AACP"
 > head(mapIds(org, keys(org), "ALIAS", "ENTREZID", multiVals="CharacterList"))
CharacterList of length 6
[["1"]] A1B ABG GAB HYST2477 A1BG
[["2"]] A2MD CPAMD5 FWP007 S863-7 A2M
[["3"]] A2MP A2MP1
[["9"]] AAC1 MNAT NAT-1 NATI NAT1
[["10"]] AAC2 NAT-2 PNAT NAT2
[["11"]] AACP NATP1 NATP
 > str(head(mapIds(org, keys(org), "ALIAS", "ENTREZID", multiVals="list")))
List of 6
  $ 1 : chr [1:5] "A1B" "ABG" "GAB" "HYST2477" ...
  $ 2 : chr [1:5] "A2MD" "CPAMD5" "FWP007" "S863-7" ...
  $ 3 : chr [1:2] "A2MP" "A2MP1"
  $ 9 : chr [1:5] "AAC1" "MNAT" "NAT-1" "NATI" ...
  $ 10: chr [1:4] "AAC2" "NAT-2" "PNAT" "NAT2"
  $ 11: chr [1:3] "AACP" "NATP1" "NATP"

Also since this is the devel list, there is

 > library(dplyr)
 > d = src_sqlite(org.Hs.eg_dbfile())
 > d
src:  sqlite 3.8.6 
[/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/3.2-BiocDevel/org.Hs.eg.db/extdata/org.Hs.eg.sqlite]
tbls: accessions, alias, chrlengths, chromosome_locations, chromosomes,
   cytogenetic_locations, ec, ensembl, ensembl_prot, ensembl_trans,
   ensembl2ncbi, gene_info, genes, go, go_all, go_bp, go_bp_all, go_cc,
   go_cc_all, go_mf, go_mf_all, kegg, map_counts, map_metadata, metadata,
   ncbi2ensembl, omim, pfam, prosite, pubmed, refseq, sqlite_stat1, ucsc,
   unigene, uniprot
 > d %>% tbl("alias") %>% group_by(`_id`) %>% summarize(alias_symbol)
Source: sqlite 3.8.6 
[/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/3.2-BiocDevel/org.Hs.eg.db/extdata/org.Hs.eg.sqlite]
From: <derived table> [?? x 2]

    _id alias_symbol
1    1         A1BG
2    2          A2M
3    3        A2MP1
4    4         NAT1
5    5         NAT2
6    6         NATP
7    7     SERPINA3
8    8        AADAC
9    9         AAMP
10  10        AANAT
.. ...          ...

(with lots of nice confusion there, including extensive masking of symbols 
between dplyr / AnnotationDbi, need for knowledge of the schema (basically a 
central id, ENTREZID for org packages, and tables of mappings from the central 
id to other ids), and the more-or-less arbitrary choice of alias_symbol).

Martin