Skip to content
Prev 1236 / 21312 Next

[Bioc-devel] RFC: Naming scheme for organism level annotation data packages

Hi Sean,

Sean Davis <sdavis2 at mail.nih.gov> writes:
Interesting.  Although I can see how this would work from a DB point
of view, it isn't clear to me that such a combined packge would be
feasible/desirable.  If the IDs are more or less different names for
the same things, then no problem.  But if a new ID induces an entirely
new mapping of all the downstream relations, well, the resulting DB
size could be prohibitive.

Your pseudocode suggests the notion of a package-level object
"org.Hs.mappings".  That isn't something we've implemented in
AnnotationDbi, but I like the idea.

I'd like to point out that we have a number of the SQLite-based
annotation data packages available in devel and this would be a great
time for interested parties to give them a try and send us feedback.

The packages should work as drop-in replacements for the
environment-based packages.  There are some additional features which
currently are only documented in the AnnotationDbi vignette.
It seems to me that this only works if the IDs are nearly equivalent.
If not, each "primary ID" needs to be deeply involved in the process
of creating the DB tables.
Let me know if I'm misunderstanding, but here I think you are
describing a system that would define a mapping, say, from enseml to
EG and it isn't clear to me that this is what someone wanting ensembl
annotation would really want -- it would allow them to work with
ensembl IDs, but using EG annotation.

Best Wishes,

+ seth