[Bioc-devel] Feedback on OrganismDb development
On Thu, Apr 7, 2016 at 10:34 AM, Obenchain, Valerie <
Valerie.Obenchain at roswellpark.org> wrote:
BioC developers, After the release we plan to continue development the OrganismDb class and packages. This email outlines some ideas for future direction. We're interested in feedback on these points as well as other thoughts people might have. ## Background The OrganismDb class is defined in the OrganismDbi package and consists of a TxDb object and the combined mappings from GO.db and an OrgDb. It supports the select() interface as well as several range-based extractors such as exons(), transcripts(), etc. The idea was that given a particular organism, a user would only need a single package to access both system biology and transcripts-centric annotations. We currently have 3 OrganismDb packages (http://www.bioconductor.org/packages/release/BiocViews.html#___OrganismDb ). These are light weight and don't contain any data themselves but instead point to the GO.db, OrgDb and TxDb packages. ## Current issues - Support for sequence representation We've discussed incorporating an optional sequence component, maybe BSgenome or 2bit or ... ?
it could be convenient to have a reference to a relevant sequence source, presumably the BSgenome... packages
- Class name OrganismDb is similar to OrgDb which could cause some confusion. We are considering renaming ... here are a few ideas. Let us know what you think or add your suggestion. OrganismDb (fine as is, leave it)
Leave it, I have seen no objections or confusions.
FullOrgDb CrossDb MultipleDb - Package name The current names are not very descriptive: Homo.sapiens, Mus.musculus and Rattus.norvegicus. We'd like to follow the naming convention used in our BSgenome and TxDb packages which means including the source, build and track from the TxDb as well as preceding with the class type. For example, the current 'Homo.sapiens' package would be renamed 'OrganismDb.Hsapiens.UCSC.hg19.knownGene'.
A simple package name is great for promoting and getting use. My sense is that the OrganismDb concept is underused. I find it a convenient place to go for seqinfo, seqlengths, symbol translations. The objects are lightweight enough that it would seem to me that we really want to focus on methods for creating appropriate and valid instances at the session level. Parameters of interest would seem to be the genome reference build, the gene model source, and maps of genomic feature sets (GO, KEGG, etc.) that one would like to use with "select" in some rational way.
- Pre-made packages Is it useful to supply pre-made packages or just increase awareness of the helpers so users can make their own? Current helpers:
?makeOrganism
?makeOrganismDbFromBiomart ?makeOrganismDbFromTxDb ?makeOrganismDbFromUCSC ?makeOrganismPackage NOTE: makeOrgansimPackage() will be renamed to makeOrganismDbPackage(). Thanks. Valerie This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel