[Bioc-devel] Update on SQLite-based annotation data package (prototype available)

Wolfgang Huber <huber at ebi.ac.uk> writes:
Hi Seth,

I installed the package, but I get:

? hgu95av2db
No documentation for 'hgu95av2db' in specified packages and libraries:
you could try 'help.search("hgu95av2db")'

?getDb
No documentation for 'getDb' in specified packages and libraries:
you could try 'help.search("getDb")'

?hgu95av2CHRLOC
No documentation for 'hgu95av2CHRLOC' in specified packages and libraries:
you could try 'help.search("hgu95av2CHRLOC")'

and there is also no vignette
Yep.  It really is a prototype.  To get started, try pretending you
have called library(hgu95av2).  IOW, you should have all the same
"environments" (in quotes because now they are S4 instances) and can
treat them as such.

We will put some documentation together for the experimental APIs we
are working on, but things are in flux.  Herve has a vignette like
document that we will post asap.

Some notes on performance are worth noting...  The database approach
is going to be slower than having everything in memory for many
operations.  When retrieving annotation for reasonably small gene
lists, the difference is not huge.  However, for operations that pull
everything from a given mapping, such as as.list(), you will see a
huge difference.  

So why are the SQLite-based packages a good thing?  Here are some
thoughts:

  1. They will allow us to deal with much larger data collections.
     The environment-based packages require being able to have all of
     the data in memory at once and provide no easy way to unload the
     data once it has been loaded.  The SQLite-based packages can
     easily handle much larger data sizes and pull only the requested
     data into memory at any one time.

  2. More flexible queries.  With the SQLite-based packages, many
     queries that currently require loops over possible many entire
     environments can be accomplished in one statement.  Using some
     simple SQL statements, I've been able to improve the performance
     of the hyperGTest function by 10x.  Focused queries will
     generally be much faster with the SQLite-based packages.

+ seth

sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-22 r40543)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets"
[7] "methods"   "base"

other attached packages:
   hgu95av2db AnnotationDbi       RSQLite           DBI       Biobase
    "1.13.91"      "0.0.41"      "0.4-19"      "0.1-12"     "1.13.34"
     fortunes
      "1.3-2"
Cheers
 Wolfgang

We are making progress on converting the annotation data packages to
use SQLite as the backend storage mechanism.

The devel annotation package repository has a prototype of a
SQLite-based annotation data package (hgu95av2db).  If you are running
R-devel, then you should be able to install it via biocLite (sorry,
only source package at this point).

The SQLite-based annotation packages depend on the AnnotationDbi
package which provides an environment-like interface that should be
backwards compatible.  Advanced users can get a connection to the DB
and issue raw SQL queries.  We are also planning to provide more
convenience/accessor functions along the lines of the annotate
package.

Our plan for the upcoming 2.0 release of Bioconductor is to include
both environment-based and SQLite-based annotation packages.

If you maintain a package that makes use of annotation data packages,
it would be good to see if the hgu95av2db prototype will work with
your code (if not, please let us know).

+ seth

_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

[Bioc-devel] Update on SQLite-based annotation data package (prototype available)

Thread (6 messages)