[Bioc-devel] Question about org.Dr.eg.db package
Hi Gennady, That information should probably be cleaned up, and the BiMaps that point to the location data removed. While the OrgDbs do contain position information, it's been deprecated, which you would find if you tried to query using select():
select(org.Dr.eg.db, "30037", "CHR")
'select()' returned 1:1 mapping between keys and columns ENTREZID CHR 1 30037 5 Warning message: In .deprecatedColsMessage() : Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is deprecated. Please use a range based accessor like genes(), or select() with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb object instead. The rationale being that the OrgDb packages are intended to contain functional annotations, which are not based on any build, and instead are current as of the construction of the OrgDb package. Since positional information should be based on a genome release, those data have been migrated to the TxDb and EnsDb packages, which are based on a given release. Put a different way, the data in an OrgDb package is downloaded from NCBI as of a particular date, and the positional data we get are whatever we got from NCBI on that date. This is obviously a problem for the positional data, because what we get isn't necessarily build-specific. We get the TxDb data from the UCSC Genome Browser, which is build specific, so we can tell end users exactly what build the data come from. Ideally these data would be defunct in the OrgDb packages, but it hasn't happened yet. Best, Jim On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via
Bioc-devel <bioc-devel at r-project.org> wrote:
Hi Vincent,
Thank you for responding.
Here is from the R documentation help page from this package (I have
version 3.10.0 (I doubt anything changed with the latest one, which is
3.11.4)):
-------------------------------------------------
org.Dr.egCHRLOC {org.Dr.eg.db}
Entrez Gene IDs to Chromosomal Location
Description
org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the
starting position of the gene. The position of a gene is measured as the
number of base pairs.
The CHRLOCEND mapping is the same as the CHRLOC mapping except that it
specifies the ending base of a gene instead of the start.
??
-------------------------------------------------
This output also does not show any genome version:
org.Dr.eg_dbInfo()
name
value
1 DBSCHEMAVERSION
2.1
2 Db type
OrgDb
3 Supporting package
AnnotationDbi
4 DBSCHEMA
ZEBRAFISH_DB
5 ORGANISM
Danio rerio
6 SPECIES
Zebrafish
7 EGSOURCEDATE
2019-Jul10
8 EGSOURCENAME
Entrez Gene
9 EGSOURCEURL
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10 CENTRALID
EG
11 TAXID
7955
12 GOSOURCENAME
Gene Ontology
13 GOSOURCEURL
ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
14 GOSOURCEDATE
2019-Jul10
15 GOEGSOURCEDATE
2019-Jul10
16 GOEGSOURCENAME
Entrez Gene
17 GOEGSOURCEURL
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
18 KEGGSOURCENAME
KEGG GENOME
19 KEGGSOURCEURL
ftp://ftp.genome.jp/pub/kegg/genomes
20 KEGGSOURCEDATE
2011-Mar15
21 GPSOURCENAME UCSC Genome Bioinformatics
(Danio rerio)
22 GPSOURCEURL
23 GPSOURCEDATE
2017-Nov1
24 ENSOURCEDATE
2019-Jun24
25 ENSOURCENAME
Ensembl
26 ENSOURCEURL
ftp://ftp.ensembl.org/pub/current_fasta
27 UPSOURCENAME
Uniprot
28 UPSOURCEURL
http://www.UniProt.org/
29 UPSOURCEDATE Mon Oct 21
14:32:30 2019
From: Vincent Carey <stvjc at channing.harvard.edu>
Date: Thursday, August 13, 2020 at 2:46 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" <gennady.margolin at nih.gov>
Cc: "bioc-devel at r-project.org" <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package
This should probably be posed to the support site. What version of the
package are you using? Where
are you seeing coordinates? I would expect those to be obtained from the
TxDb package, or perhaps
from AnnotationHub.
columns(org.Dr.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT"
"ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "ONTOLOGY"
"ONTOLOGYALL"
[16] "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ"
[21] "SYMBOL" "UNIGENE" "UNIPROT" "ZFIN"
On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via
Bioc-devel <bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>>
wrote:
Hello,
I have a short question ? how do I figure the genome version for
org.Dr.eg.db package? I couldn?t see it in the DESCRIPTION and also it?s
not in org.Dr.eg_dbInfo() output. It would be nice to know if this is
danRer11/GRCz11 or some other assembly, as there are coordinates present in
the DB.
Thank you,
Gennady
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel The information in this e-mail is intended only for th...{{dropped:31}}