Skip to content

[Bioc-devel] Apparent error in illuminaHumanv4.db

2 messages · Taku Tokuyasu, James W. MacDonald

#
Hi Taku,

This 'error' is not due to anything in the illuminahumanv4.db package. 
All that package does is link the probe IDs to Entrez Gene IDs, and then 
the org.Hs.eg.db package does the remainder of the annotation. So if we 
look at org.Hs.eg.db, we get this:

 > select(org.Hs.eg.db, c("C16ORF15","C16orf15","C15orf16"), 
c("ENTREZID","SYMBOL","GENENAME"), "ALIAS")
      ALIAS ENTREZID SYMBOL                 GENENAME
1 C16ORF15   161725 OTUD7A OTU domain containing 7A
2 C16orf15   197335  WDR90      WD repeat domain 90
3 C15orf16   161725 OTUD7A OTU domain containing 7A


And if we go to NCBI and search the Gene database, we get (in order):

Gene ID 161725

Official Symbol
    OTUD7Aprovided by HGNC <http://www.genenames.org/>
Official Full Name
    OTU deubiquitinase 7Aprovided by HGNC <http://www.genenames.org/>
Primary source
    HGNC:20718 <http://www.genenames.org/data/hgnc_data.php?hgnc_id=20718> 
See related
    Ensembl:ENSG00000169918; <http://www.ensembl.org/id/ENSG00000169918>
    HPRD:12666; <http://www.hprd.org/protein/12666> MIM:612024;
    <http://www.ncbi.nlm.nih.gov/omim/612024> Vega:OTTHUMG00000129275
    <http://vega.sanger.ac.uk/id/OTTHUMG00000129275> 
Gene type
    protein coding
RefSeq status
    PROVISIONAL
Organism
    Homo sapiens
    <https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606> 
Lineage
    Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
    Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
    Catarrhini; Hominidae; Homo
Also known as
    OTUD7; C15orf16; C16ORF15; CEZANNE2

And

Gene ID 197335

Official Symbol
    WDR90provided by HGNC <http://www.genenames.org/>
Official Full Name
    WD repeat domain 90provided by HGNC <http://www.genenames.org/>
Primary source
    HGNC:26960 <http://www.genenames.org/data/hgnc_data.php?hgnc_id=26960> 
See related
    Ensembl:ENSG00000161996; <http://www.ensembl.org/id/ENSG00000161996>
    HPRD:08311; <http://www.hprd.org/protein/08311> HPRD:14118;
    <http://www.hprd.org/protein/14118> Vega:OTTHUMG00000048040
    <http://vega.sanger.ac.uk/id/OTTHUMG00000048040> 
Gene type
    protein coding
RefSeq status
    PROVISIONAL
Organism
    Homo sapiens
    <https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606> 
Lineage
    Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
    Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
    Catarrhini; Hominidae; Homo
Also known as
    C16orf15; C16orf16; C16orf17; C16orf18; C16orf19


So what is in the org.Hs.eg.db package conforms exactly to the data from 
NCBI. Please note that the annotation packages supplied by Bioconductor 
are simply re-formulations of data we get from sources like NCBI, and we 
make no claims as to the accuracy of those data. In other words, we try 
our best to ensure that the information you get from a given annotation 
package conforms exactly to what you would get by going to the NCBI 
website and searching by hand, but do NOT make any claims as to the 
accuracy of the data on the NCBI website.

And there have been any number of emails on this list by Marc Carlson, 
explaining to people that HGNC symbols and especially other random 
aliases are not unique, and should not be relied upon for annotating 
data accurately. So yeah, don't do that.

Best,

Jim
On 3/27/2014 6:11 AM, Taku Tokuyasu wrote: