Are we still using the scripts in BioconductorAnnotationPipeline/go/scripts
to download GO data and create the GO.db package?
If so, that is likely a problem that will only get worse with time.
Apparently geneontology.org is no longer generating the SQL dumps that the
go scripts rely on, so whatever we download is outdated. There have been
some complaints to the helpdesk about the data (
https://github.com/geneontology/helpdesk/issues/4), where they discuss a
new pipeline (RDF) that may not have ended up being the new pipeline?
Apparently they are now using OBO or OWL (
http://geneontology.org/docs/download-ontology/) for the downloadable data,
so we should consider switching.
I bring this up because apparently the current release GO.db is missing
terms that were added as far back as 2018.
Best,
Jim
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
Further to this point, when comparing to the latest OBO from geneontology,
it looks like the current GO.db has just over 1000 GO IDs that are not in
GO any longer, and almost 500 GO IDs are in the GO OBO file that are not in
GO.db
On Wed, Apr 1, 2020 at 12:11 PM James W. MacDonald <jmacdon at uw.edu> wrote:
Are we still using the scripts in
BioconductorAnnotationPipeline/go/scripts to download GO data and create
the GO.db package?
If so, that is likely a problem that will only get worse with time.
Apparently geneontology.org is no longer generating the SQL dumps that
the go scripts rely on, so whatever we download is outdated. There have
been some complaints to the helpdesk about the data (
https://github.com/geneontology/helpdesk/issues/4), where they discuss a
new pipeline (RDF) that may not have ended up being the new pipeline?
Apparently they are now using OBO or OWL (
http://geneontology.org/docs/download-ontology/) for the downloadable
data, so we should consider switching.
I bring this up because apparently the current release GO.db is missing
terms that were added as far back as 2018.
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
Yes - those are the scripts we are using. We (Martin and I) will be spending
time tomorrow to work through this. With the release quickly approaching
we will work on making edits to the scripts to download this new data.
Thank you for bringing this to our attention.
- Kayla
?On 4/1/20, 12:44 PM, "Bioc-devel on behalf of James W. MacDonald" <bioc-devel-bounces at r-project.org on behalf of jmacdon at uw.edu> wrote:
Further to this point, when comparing to the latest OBO from geneontology,
it looks like the current GO.db has just over 1000 GO IDs that are not in
GO any longer, and almost 500 GO IDs are in the GO OBO file that are not in
GO.db
On Wed, Apr 1, 2020 at 12:11 PM James W. MacDonald <jmacdon at uw.edu> wrote:
> Are we still using the scripts in
> BioconductorAnnotationPipeline/go/scripts to download GO data and create
> the GO.db package?
>
> If so, that is likely a problem that will only get worse with time.
> Apparently geneontology.org is no longer generating the SQL dumps that
> the go scripts rely on, so whatever we download is outdated. There have
> been some complaints to the helpdesk about the data (
> https://github.com/geneontology/helpdesk/issues/4), where they discuss a
> new pipeline (RDF) that may not have ended up being the new pipeline?
>
> Apparently they are now using OBO or OWL (
> http://geneontology.org/docs/download-ontology/) for the downloadable
> data, so we should consider switching.
>
> I bring this up because apparently the current release GO.db is missing
> terms that were added as far back as 2018.
>
> Best,
>
> Jim
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.