Skip to content

[Bioc-devel] Request for comment metagenomeFeatures package

6 messages · Nathan Olson, Hector Corrada Bravo, Martin Morgan +1 more

#
We are starting to work on an infrastructure for annotation of 16S
metagenomic sequencing datasets and would like your comments and/or
contributions. Below are links to two github repositories:
metagenomeFeatures and greengenes13.5MgDb.  The metagenomeFeatures package
contains two classes; mgDb, for 16S sequence databases, and
metagenomeAnnotation, for annotating a sequence dataset with taxonomic
information from a mgDb object.  The greengenes13.5MgDb package, loads a
mgDb object with the greengenes 13.5 database.  greengenes 13.5 was used as
an example database, we plan on adding additional packages for other
commonly used databases, e.g RDP and Silva.

The metagenomeFeatures includes two vignettes to demonstrating the mgDb and
metagenomeAnnotation class methods using the greengenes13.5MgDb as an
example database.

We are planning on adding additional methods for the mgDb and
metagenomeAnnotation classes.  For the mgDb class, assigning query
sequences to database sequences using rRDP classifier, and/or sequence
alignment methods that are part of the Biostrings package.  For the
metagenomeAnnotation class we plan to include the ability to create a
phylogenetic tree from a metagenomeAnnotation object.
We would appreciate comments on the package and suggestions for additional
features.

Links to package github repositories

https://github.com/HCBravoLab/metagenomeFeatures

https://github.com/HCBravoLab/greengenes13.5MgDb

Thanks

Nate Olson and Hector Corrada Bravo
#
very interesting development, we have several folks who will take a look.

FYI

%vjcair> R CMD INSTALL greeng*b

Bioconductor version 3.2 (BiocInstaller 1.19.9), ?biocLite for help

Loading required package: digest

Loading required package: tools

Loading required package: utils

Loading required package: codetools

* installing to library
?/Library/Frameworks/R.framework/Versions/3.2/Resources/library?

* installing *source* package ?greengenes13.5MgDb? ...

** R

** preparing package for lazy loading

Warning in .recacheSubclasses(def at className, def, doSubclasses, env) :

  undefined subclass "externalRefMethod" of class "functionORNULL";
definition not updated

** help

No man pages found in package  ?greengenes13.5MgDb?

*** installing help indices

** building package indices

** testing if installed package can be loaded

Bioconductor version 3.2 (BiocInstaller 1.19.9), ?biocLite for help

Loading required package: digest

Loading required package: tools

Loading required package: utils

Loading required package: codetools

Warning in .recacheSubclasses(def at className, def, doSubclasses, env) :

  undefined subclass "externalRefMethod" of class "functionORNULL";
definition not updated

/gg_13_5.fasta.gz: Permission denied

On Tue, Aug 4, 2015 at 9:43 AM, Nathan Olson <nathandavidolson at gmail.com>
wrote:

  
  
#
Thanks Vince,
I think we just fixed that:
https://github.com/HCBravoLab/greengenes13.5MgDb/issues/1#issuecomment-127649449

Cheers,
Hector

On Tue, Aug 4, 2015 at 10:45 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
  
#
On 08/04/2015 06:43 AM, Nathan Olson wrote:
does it make sense to use AnnotationHub to manage these resources? Instead of 
downloading and managing the fasta and taxonomy files in .onLoad and 
getGreenGenes13.5Db, .onLoad would be

   hub = AnnotationHub()
   db_seq = hub[["AH12345"]]
   db_taxa_file = hub[["AH12346"]]

with a 'recipe' describing how the corresponding annotation hub resources are to 
be created. This would move download and management to AnnotationHub, and 
potentially allow use of the annotation hub records by people with other 
interests. If that sounds interesting we can work up a pull request.

Martin

  
    
#
On Tue, Aug 4, 2015 at 3:00 PM, Martin Morgan <mtmorgan at fredhutch.org>
wrote:
I would think so.  At this time, trying to install greengenes13.5MgDb
package, the process "testing whether the package
can be loaded" takes a very long time -- I suspect it is doing some silent
downloading.  IMHO such activities
should be explicitly undertaken by the user.
With this setup the first installation of the package could involve a long
download, silent by default.  It's feasible but
quite unusual.

  
  
#
Thanks, Martin.  I agree using AnnotationHub to manage the db resources is
a better option than how it is currently setup.  A pull request would be
much appreciated.

On Tue, Aug 4, 2015 at 3:14 PM Vincent Carey <stvjc at channing.harvard.edu>
wrote: