[Bioc-devel] annotation data not updated?
On Wed, Nov 15, 2017 at 7:50 AM, Shepherd, Lori <
Lori.Shepherd at roswellpark.org> wrote:
When this issue was brought up I updated the files that were downloaded when using AnnotationHub so they should be updated as well.
Thanks. How are the OrgDb files for AnnotationHub built? I just made one for Salmo salar using makeOrgPackageFromNCBI, and the GO IDs for that package match those in GO.db. One of the GO IDs in the AnnotationHub OrgDb for Salmo salar (that is not in GO.db) is GO:0044744, which was made a secondary ID for GO:0034504 on 6/29/2017, which seems too far in the past to have not been picked up by an update in November. If I just pick another OrgDb at random, it has outdated GO IDs as well:
query(hub, c("macaca","orgdb"))
AnnotationHub with 3 records
# snapshotDate(): 2017-10-27
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Macaca cynomolgus, Macaca mulatta, Macaca nemestrina
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH57977"]]'
title
AH57977 | org.Mmu.eg.db.sqlite
AH58035 | org.Macaca_nemestrina.eg.sqlite
AH58053 | org.Macaca_cynomolgus.eg.sqlite
z <- hub[["AH58035"]]
downloading from https://annotationhub.bioconductor.org/fetch/64781 retrieving 1 resource |======================================================================| 100%
sum(!keys(z, "GOALL") %in% keys(GO.db))
[1] 13
keys(z, "GOALL")[!keys(z, "GOALL") %in% keys(GO.db)]
[1] "GO:0007067" "GO:0016337" "GO:0044699" "GO:0044700" "GO:0044702" [6] "GO:0044707" "GO:0044710" "GO:0044711" "GO:0044763" "GO:0044765" [11] "GO:0044767" "GO:0098602" "GO:1902578" So far as I can tell, all of these terms have been replaced, so it looks like the GO source date were outdated? Jim
The files were updated but the rdatadateadded was not updated when I added the new files. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263
________________________________
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of James W.
MacDonald <jmacdon at uw.edu>
Sent: Tuesday, November 14, 2017 7:54:54 PM
To: Van Twisk, Daniel
Cc: bioc-devel; Yu, Guangchuang
Subject: Re: [Bioc-devel] annotation data not updated?
On Thu, Nov 9, 2017 at 9:48 AM, Van Twisk, Daniel <
Daniel.VanTwisk at roswellpark.org> wrote:
Thanks for looking into this. New versions of the OrgDbs and Db0s
(v3.5.0) are now available that have up-to-date resources. Here is the
output of the new org.Hs.eg.db
Does this issue affect the OrgDbs on AnnotationHub as well? I am finding
e.g., that the OrgDb for Salmo salar contains GO IDs that no longer exist
in GO.db.
zz
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Salmo salar
| SPECIES: Salmo salar
| CENTRALID: GID
| Taxonomy ID: 8030
| Db type: OrgDb
| Supporting package: AnnotationDbi
Please see: help('select') for usage information
sum(!keys(zz, "GOALL") %in% keys(GO.db))
[1] 38
But this isn't true of, for example, the Homo sapiens OrgDb from
AnnotationHub
z
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: 2017-Nov6
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL:
ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 2017-Nov01
| GOEGSOURCEDATE: 2017-Nov6
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
| GPSOURCEURL:
| GPSOURCEDATE: 2017-Oct9
| ENSOURCEDATE: 2017-Aug23
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Tue Nov 7 20:57:02 2017
Please see: help('select') for usage information
sum(!keys(z, "GOALL") %in% keys(GO.db))
[1] 0
But I am not sure when they were added, because the human OrgDb has an
rdatadateadded that is obviously not correct, since it precedes the
SOURCEDATEs from the OrgDb itself!
mcols(hub["AH57973"])$rdatadateadded <------ Human
[1] "2017-10-23"
mcols(hub["AH58003"])$rdatadateadded <------ Salmo
[1] "2017-10-27"
Best,
Jim
x <- org.Hs.eg.db
x
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: 2017-Nov6
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
latest-lite/
| GOSOURCEDATE: 2017-Nov01
| GOEGSOURCEDATE: 2017-Nov6
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
| GPSOURCEURL:
| GPSOURCEDATE: 2017-Oct9
| ENSOURCEDATE: 2017-Aug23
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Tue Nov 7 20:57:02 2017
________________________________
From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
Obenchain, Valerie <Valerie.Obenchain at RoswellPark.org>
Sent: Thursday, November 2, 2017 12:47:43 PM
To: Yu, Guangchuang; bioc-devel
Subject: Re: [Bioc-devel] annotation data not updated?
Guangchuang,
Thanks for reporting this. We've looked into it and there is indeed a
more
recent version of the data. Daniel is working on re-generating the db0
and
OrgDb packages. We'll post back with more information when the packages
are
ready.
Valerie
On 11/02/2017 05:40 AM, Yu, Guangchuang wrote:
Dear all,
I just upgraded BioC to 3.6 and found that the data source of
org.Hs.eg.db
and GO.db is still half year ago.
I was wondering whether these packages had been updated in current
release.
org.Hs.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: *2017-Mar29*
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
latest-lite/
| GOSOURCEDATE: *2017-Mar29*
| GOEGSOURCEDATE: 2017-Mar29
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
| GPSOURCEURL:
| GPSOURCEDATE: 2017-Sep7
| ENSOURCEDATE: 2017-Mar29
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Thu Oct 5 16:07:33 2017
Please see: help('select') for usage information
GO.db
GODb object:
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
latest-lite/
| GOSOURCEDATE: *2017-Mar29*
| Db type: GODb
| package: AnnotationDbi
| DBSCHEMA: GO_DB
| GOEGSOURCEDATE: 2017-Mar29
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| DBSCHEMAVERSION: 2.1
Please see: help('select') for usage information
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/
Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/
Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] org.Hs.eg.db_3.4.2 GO.db_3.4.2 AnnotationDbi_1.40.0
[4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
[7] BiocGenerics_0.24.0 rvcheck_0.0.9 rmarkdown_1.6
[10] roxygen2_6.0.1 magrittr_1.5 BiocInstaller_1.28.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.13 knitr_1.17 xml2_1.1.1 bit_1.1-12
[5] R6_2.2.2 rlang_0.1.2 blob_1.1.0 stringr_1.2.0
[9] tools_3.4.2 DBI_0.7 htmltools_0.3.6 commonmark_1.4
[13] bit64_0.9-7 rprojroot_1.2 digest_0.6.12 tibble_1.3.4
[17] memoise_1.1.0 RSQLite_2.0 evaluate_0.10.1 stringi_1.1.5
[21] compiler_3.4.2 backports_1.1.1 pkgconfig_2.0.1
?
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee
or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited. If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee
or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited. If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited. If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]