Storing R objects (was [R] advice requested re: building "good" system (R, SQL db) for handling large datasets)
On Feb 7, 2008 7:16 AM, Richard Pearson
<richard.pearson at postgrad.manchester.ac.uk> wrote:
(moved to R-sig-db from R-help) Jeff, I have a project where I want to create large numbers of large, complex objects (e.g. bioconductor ExpressionSet objects). I want to store these along with metadata (such as what raw data and parameters were used to create the object). I will later want to access subsets of these objects, with the subset specified by a query. It seems to me the natural way to do this would be to store the metadata and the objects themselves in database tables, and I have assumed that the objects would need to be serialised and stored as BLOBs. It sounds like at present there are no plans for infrastructure that would allow me to do this, but I would be interested to know if anyone plans to make such a scenario possible in the future. I am assuming in the above that it is not possible to store arbitrarily complex R objects in a DB, without a lot of work coercing all the various slots in the object to data.frames, and saving the data.frames to different tables. I've had a quick scan through the documentation for DBI, RODBC, RMySQL and ROracle, but couldn't see any such functionality. An alternative for my situation would be to store the R objects as files (using save) and store the metadata and filenames in a DB, but this seems to me to add an extra layer of complexity/maintenance. Finally, I could of course save everything as files, but one of the reasons for storing things in a DB is because I would like to create dynamic web pages linked to metadata and results data in the DB.
This type of application comes up often in web design. The general thinking is that storing objects (such as images, etc.) on the disk is just fine. I would think that you would want to create functions like: queryMetadata() # returns a list of ExpressionSet keys fetchExprSets() # takes a list of ExpressionSet keys and returns a list of ExpressionSets storeExprSetAndMetadata() #take an ExpressionSet, stores it, and returns the associated unique key .... These would allow you the flexibility of changing underlying storage mechanisms as you go along to whatever you like without changing the business code. The concept of keeping the data model separate from the rest of the code (that which controls the web application itself) is one of the key concepts underlying the Model-View-Controller (MVC) model of application design. In practical terms, it seems that since R automatically serializes objects efficiently and in a compressed format it would be appropriate to use that mechanism as a first pass; it could be later modified if necessary. Just my $0.02 worth. Sean