[Bioc-devel] RFC: Naming scheme for organism level annotation data packages

Seth Falcon · 2007-07-23T14:15:10Z

Hi Sean, Sean Davis writes: > Since Seth et al. have produced a wonderfully useful db-based system, > it seems that these data packages could be much more flexible from an > ID point of view. One has a primary ID associated with the data > package, but mappings, to the extent that they are available, could > also be included. Then, you could have something like: > > primaryKey(org.Hs.mappings) > [1] "EntrezGene" > > availableKeys(org.Hs.mappings) > KeyType Examp

Seth Falcon

Mon, Jul 23, 2007 7:15 AM

Hi Sean,

Sean Davis <sdavis2 at mail.nih.gov> writes:

Interesting.  Although I can see how this would work from a DB point
of view, it isn't clear to me that such a combined packge would be
feasible/desirable.  If the IDs are more or less different names for
the same things, then no problem.  But if a new ID induces an entirely
new mapping of all the downstream relations, well, the resulting DB
size could be prohibitive.

Your pseudocode suggests the notion of a package-level object
"org.Hs.mappings".  That isn't something we've implemented in
AnnotationDbi, but I like the idea.

I'd like to point out that we have a number of the SQLite-based
annotation data packages available in devel and this would be a great
time for interested parties to give them a try and send us feedback.

The packages should work as drop-in replacements for the
environment-based packages.  There are some additional features which
currently are only documented in the AnnotationDbi vignette.

It seems to me that this only works if the IDs are nearly equivalent.
If not, each "primary ID" needs to be deeply involved in the process
of creating the DB tables.

Let me know if I'm misunderstanding, but here I think you are
describing a system that would define a mapping, say, from enseml to
EG and it isn't clear to me that this is what someone wanting ensembl
annotation would really want -- it would allow them to work with
ensembl IDs, but using EG annotation.

Best Wishes,

+ seth

Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

[Bioc-devel] RFC: Naming scheme for organism level annotation data packages

Thread (9 messages)