Hello all, We are making progress on converting the annotation data packages to use SQLite as the backend storage mechanism. The devel annotation package repository has a prototype of a SQLite-based annotation data package (hgu95av2db). If you are running R-devel, then you should be able to install it via biocLite (sorry, only source package at this point). The SQLite-based annotation packages depend on the AnnotationDbi package which provides an environment-like interface that should be backwards compatible. Advanced users can get a connection to the DB and issue raw SQL queries. We are also planning to provide more convenience/accessor functions along the lines of the annotate package. Our plan for the upcoming 2.0 release of Bioconductor is to include both environment-based and SQLite-based annotation packages. If you maintain a package that makes use of annotation data packages, it would be good to see if the hgu95av2db prototype will work with your code (if not, please let us know). + seth
[Bioc-devel] Update on SQLite-based annotation data package (prototype available)
6 messages · Wolfgang Huber, Vincent Carey, Seth Falcon +1 more
Hi Seth, I installed the package, but I get:
? hgu95av2db
No documentation for 'hgu95av2db' in specified packages and libraries:
you could try 'help.search("hgu95av2db")'
?getDb
No documentation for 'getDb' in specified packages and libraries:
you could try 'help.search("getDb")'
?hgu95av2CHRLOC
No documentation for 'hgu95av2CHRLOC' in specified packages and libraries:
you could try 'help.search("hgu95av2CHRLOC")'
and there is also no vignette
sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-22 r40543)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "methods" "base"
other attached packages:
hgu95av2db AnnotationDbi RSQLite DBI Biobase
"1.13.91" "0.0.41" "0.4-19" "0.1-12" "1.13.34"
fortunes
"1.3-2"
Cheers
Wolfgang
We are making progress on converting the annotation data packages to use SQLite as the backend storage mechanism. The devel annotation package repository has a prototype of a SQLite-based annotation data package (hgu95av2db). If you are running R-devel, then you should be able to install it via biocLite (sorry, only source package at this point). The SQLite-based annotation packages depend on the AnnotationDbi package which provides an environment-like interface that should be backwards compatible. Advanced users can get a connection to the DB and issue raw SQL queries. We are also planning to provide more convenience/accessor functions along the lines of the annotate package. Our plan for the upcoming 2.0 release of Bioconductor is to include both environment-based and SQLite-based annotation packages. If you maintain a package that makes use of annotation data packages, it would be good to see if the hgu95av2db prototype will work with your code (if not, please let us know). + seth
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
Wolfgang Huber <huber at ebi.ac.uk> writes:
Hi Seth, I installed the package, but I get:
? hgu95av2db
No documentation for 'hgu95av2db' in specified packages and libraries:
you could try 'help.search("hgu95av2db")'
?getDb
No documentation for 'getDb' in specified packages and libraries:
you could try 'help.search("getDb")'
?hgu95av2CHRLOC
No documentation for 'hgu95av2CHRLOC' in specified packages and libraries:
you could try 'help.search("hgu95av2CHRLOC")'
and there is also no vignette
Yep. It really is a prototype. To get started, try pretending you
have called library(hgu95av2). IOW, you should have all the same
"environments" (in quotes because now they are S4 instances) and can
treat them as such.
We will put some documentation together for the experimental APIs we
are working on, but things are in flux. Herve has a vignette like
document that we will post asap.
Some notes on performance are worth noting... The database approach
is going to be slower than having everything in memory for many
operations. When retrieving annotation for reasonably small gene
lists, the difference is not huge. However, for operations that pull
everything from a given mapping, such as as.list(), you will see a
huge difference.
So why are the SQLite-based packages a good thing? Here are some
thoughts:
1. They will allow us to deal with much larger data collections.
The environment-based packages require being able to have all of
the data in memory at once and provide no easy way to unload the
data once it has been loaded. The SQLite-based packages can
easily handle much larger data sizes and pull only the requested
data into memory at any one time.
2. More flexible queries. With the SQLite-based packages, many
queries that currently require loops over possible many entire
environments can be accomplished in one statement. Using some
simple SQL statements, I've been able to improve the performance
of the hyperGTest function by 10x. Focused queries will
generally be much faster with the SQLite-based packages.
+ seth
sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-22 r40543)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "methods" "base"
other attached packages:
hgu95av2db AnnotationDbi RSQLite DBI Biobase
"1.13.91" "0.0.41" "0.4-19" "0.1-12" "1.13.34"
fortunes
"1.3-2"
Cheers
Wolfgang
We are making progress on converting the annotation data packages to use SQLite as the backend storage mechanism. The devel annotation package repository has a prototype of a SQLite-based annotation data package (hgu95av2db). If you are running R-devel, then you should be able to install it via biocLite (sorry, only source package at this point). The SQLite-based annotation packages depend on the AnnotationDbi package which provides an environment-like interface that should be backwards compatible. Advanced users can get a connection to the DB and issue raw SQL queries. We are also planning to provide more convenience/accessor functions along the lines of the annotate package. Our plan for the upcoming 2.0 release of Bioconductor is to include both environment-based and SQLite-based annotation packages. If you maintain a package that makes use of annotation data packages, it would be good to see if the hgu95av2db prototype will work with your code (if not, please let us know). + seth
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
2. More flexible queries. With the SQLite-based packages, many
queries that currently require loops over possible many entire
environments can be accomplished in one statement. Using some
simple SQL statements, I've been able to improve the performance
of the hyperGTest function by 10x. Focused queries will
generally be much faster with the SQLite-based packages.
do we need a sql tutorial doc (i know there are plenty on the web but perhaps some that are focused on the types of queries to be used here?) helper code that 'translates' R-like actions to SQL may be feasible for some of the more common tasks.
+ seth
sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-22 r40543)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "methods" "base"
other attached packages:
hgu95av2db AnnotationDbi RSQLite DBI Biobase
"1.13.91" "0.0.41" "0.4-19" "0.1-12" "1.13.34"
fortunes
"1.3-2"
Cheers
Wolfgang
We are making progress on converting the annotation data packages to use SQLite as the backend storage mechanism. The devel annotation package repository has a prototype of a SQLite-based annotation data package (hgu95av2db). If you are running R-devel, then you should be able to install it via biocLite (sorry, only source package at this point). The SQLite-based annotation packages depend on the AnnotationDbi package which provides an environment-like interface that should be backwards compatible. Advanced users can get a connection to the DB and issue raw SQL queries. We are also planning to provide more convenience/accessor functions along the lines of the annotate package. Our plan for the upcoming 2.0 release of Bioconductor is to include both environment-based and SQLite-based annotation packages. If you maintain a package that makes use of annotation data packages, it would be good to see if the hgu95av2db prototype will work with your code (if not, please let us know). + seth
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Vincent Carey 525-2265 <stvjc at channing.harvard.edu> writes:
2. More flexible queries. With the SQLite-based packages, many
queries that currently require loops over possible many entire
environments can be accomplished in one statement. Using some
simple SQL statements, I've been able to improve the performance
of the hyperGTest function by 10x. Focused queries will
generally be much faster with the SQLite-based packages.
do we need a sql tutorial doc (i know there are plenty on the web but perhaps some that are focused on the types of queries to be used here?) helper code that 'translates' R-like actions to SQL may be feasible for some of the more common tasks.
I'm hoping that an alternative API will solidify Real Soon Now. I would much prefer promoting a well documented API than raw SQL. Using raw SQL is effective, but relies on the schema definition. But perhaps my comments are orthogonal to your suggestion. A SQL tutorial with "translations" of R concepts is a great idea. + seth
Vincent Carey 525-2265 <stvjc at channing.harvard.edu> writes:
2. More flexible queries. With the SQLite-based packages, many
queries that currently require loops over possible many entire
environments can be accomplished in one statement. Using some
simple SQL statements, I've been able to improve the performance
of the hyperGTest function by 10x. Focused queries will
generally be much faster with the SQLite-based packages.
do we need a sql tutorial doc (i know there are plenty on the web but perhaps some that are focused on the types of queries to be used here?) helper code that 'translates' R-like actions to SQL may be feasible for some of the more common tasks.
I'm hoping that an alternative API will solidify Real Soon Now. I would much prefer promoting a well documented API than raw SQL. Using raw SQL is effective, but relies on the schema definition. But perhaps my comments are orthogonal to your suggestion. A SQL tutorial with "translations" of R concepts is a great idea.
I would only promote what Seth would prefer promoting. Having an API could be a better solution, as it would allow to provide a unified front-end to annotation packages while letting annotation to be stored in a number of different backends (loaded environments like it was the case, the coming SQLlite ones, remote SQL database, web-service, etc...). Having an API would also permit to make changes to the SQL schema without causing a lot of trouble to all users. Laurent