[Bioc-devel] IPI numbers in annotation packages
Hi Marc,
That script has this in it:
## For now just get data for the ones that we have traditionally supported
## I don't even know if the other species are available...
speciesList = c("chipsrc_human.sqlite",
"chipsrc_rat.sqlite",
"chipsrc_chicken.sqlite",
"chipsrc_zebrafish.sqlite",
# "chipsrc_worm.sqlite",
# "chipsrc_fly.sqlite",
"chipsrc_mouse.sqlite",
"chipsrc_bovine.sqlite"
# "chipsrc_arabidopsis.sqlite" ## this is available and could be
"activated"
## But to activate arabidopsis, remember you have to pre-add the tables...
# "chipsrc_canine.sqlite",
# "chipsrc_rhesus.sqlite",
# "chipsrc_chimp.sqlite",
# "chipsrc_anopheles.sqlite"
)
And there is no mention of yeast anywhere. If I search all the scripts for
say 'INSERT INTO pfam', I get
custom_anno/script/bindb.sql
328:INSERT INTO pfam
pfam/script/srcdb_pfam.sql
202:-- INSERT INTO pfamb
organism_annotation/script/bindb_yeast.sql
441:-- INSERT INTO pfam
yeast/script/bindb.sql
241:-- INSERT INTO pfam
The first one is just doing all the metadata tables, and the other three
are in code blocks that are commented out. Is it possible that you used a
script that didn't make it into svn?
Jim
On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrjc42 at gmail.com> wrote:
Hi Jim, You asked me on Friday where the PFAM Ids for yeast came from and I couldn't recall because at the moment I was at Seattle Childrens (and thus nowhere near my copy of my source code). But I also said I would look into it for you later (and I have). Here is what my code tells me: So ever since IPI shut down, we have been getting the PFAM and IPI data from UniProt. There is a script in the UniProt.ws package called processDataForBuild.R that is supposed to be called by the script "src_build.sh" (it's the last thing that script does). That code should get the pfam data from yeast for you. Please note that yeast required a lot of special code to get it processed. Nothing with yeast annotations is ever easy. It's like karmic accounting to compensate for all the bread and beer. ;) Let me know if you need any more explanations about what is in there. Because of the crazy timing, before I left I build I pushed into devel a fresh set of .DB0s and core packages (in late August) just in case it was too crazy to do a refresh right now. But it sounds like you won't need that. Marc On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
I am building the annotation db0 packages for the upcoming Bioconductor release, which are used to generate all the orgDb and chip annotation packages that we distribute. Up to the previous release we have always included IPI identifiers (as part of the table containing the PROSITE and PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer maintained (since 2011), and UniProt, which is where we got data for the last few releases, has now dropped support as well. Given that this annotation source is no longer maintained, I decided to exclude these IDs from the current build of the following db0 packages: - rat.db0 - chicken.db0 - zebrafish.db0 - mouse.db0 - bovine.db0 - human.db0 In addition, it is not clear to me (nor can Marc recall) where the data for PFAM in the yeast.db0 package comes from. Given that we are pretty far behind schedule for these packages, I have excluded that table as well. If this will break anybody's package, or if there are people who rely on these IDs, I can just parse out of the last release and deprecate, so you will have the IDs for one more release. However, if nobody cares about such things, I will just go with what we have. Please speak up if this will affect you. -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]