Skip to content
Prev 902 / 21312 Next

[Bioc-devel] Update on SQLite-based annotation data package (prototype available)

Wolfgang Huber <huber at ebi.ac.uk> writes:
Yep.  It really is a prototype.  To get started, try pretending you
have called library(hgu95av2).  IOW, you should have all the same
"environments" (in quotes because now they are S4 instances) and can
treat them as such.

We will put some documentation together for the experimental APIs we
are working on, but things are in flux.  Herve has a vignette like
document that we will post asap.

Some notes on performance are worth noting...  The database approach
is going to be slower than having everything in memory for many
operations.  When retrieving annotation for reasonably small gene
lists, the difference is not huge.  However, for operations that pull
everything from a given mapping, such as as.list(), you will see a
huge difference.  

So why are the SQLite-based packages a good thing?  Here are some
thoughts:

  1. They will allow us to deal with much larger data collections.
     The environment-based packages require being able to have all of
     the data in memory at once and provide no easy way to unload the
     data once it has been loaded.  The SQLite-based packages can
     easily handle much larger data sizes and pull only the requested
     data into memory at any one time.

  2. More flexible queries.  With the SQLite-based packages, many
     queries that currently require loops over possible many entire
     environments can be accomplished in one statement.  Using some
     simple SQL statements, I've been able to improve the performance
     of the hyperGTest function by 10x.  Focused queries will
     generally be much faster with the SQLite-based packages.

+ seth