I have been playing with a large database interface for R, and have written one complete but useless demonstration and one incomplete but potentially useful example (with memory mapping of a fixed-format ASCII file). The idea is to make the file appear like a matrix or data frame but not have to read it into the R heap. A description and code can be found at http://www.biostat.washington.edu/~thomas/Rdb.html Rdb.nw (noweb literate program) Rdb.c Rdb.R Comments? Thomas Lumley ------------------------------------------------------+------ Biostatistics : "Never attribute to malice what : Uni of Washington : can be adequately explained by : Box 357232 : incompetence" - Hanlon's Razor : Seattle WA 98195-7232 : : ------------------------------------------------------------ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
RFC: large database interface
3 messages · Thomas Lumley, Egon Schmid, Ross Ihaka
Thomas Lumley wrote:
I have been playing with a large database interface for R, and have written one complete but useless demonstration and one incomplete but potentially useful example (with memory mapping of a fixed-format ASCII file). The idea is to make the file appear like a matrix or data frame but not have to read it into the R heap. A description and code can be found at http://www.biostat.washington.edu/~thomas/Rdb.html Rdb.nw (noweb literate program) Rdb.c Rdb.R Comments?
Well, there is a web interface through the Apache module PHP Hypertext Preprocessor. At http://www.php.net/ there are plenty more database interfaces. Personaly I think it would a great idea to interface large datasets with netCDF http://www.unidata.ucar.edu/packages/netcdf
From the manual '1.2 NetCDF Is Not a Database Management System'
"Why not use an existing database management system for storing array-oriented data? Relational database software is not suitable for the kinds of data access supported by the netCDF interface. First, existing database systems that support the relational model do not support multidimensional objects (arrays) as a basic unit of data access. Representing arrays as relations makes some useful kinds of data access awkward and provides little support for the abstractions of multidimensional data and coordinate systems. A quite different data model is needed for array-oriented data to facilitate its retrieval, modification, mathematical manipulation and visualization. Related to this is a second problem with general-purpose database systems: their poor performance on large arrays. Collections of satellite images, scientific model outputs and long-term global weather observations are beyond the capabilities of most database systems to organize and index for efficient retrieval. Finally, general-purpose database systems provide, at significant cost in terms of both resources and access performance, many facilities that are not needed in the analysis, management, and display of array-oriented data. For example, elaborate update facilities, audit trails, report formatting, and mechanisms designed for transaction-processing are unnecessary for most scientific applications." On Feb 3 there was a small thread on this mailing list. -Egon -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Egon Schmid writes:
> Thomas Lumley wrote:
> > > > I have been playing with a large database interface for R, and have > > written one complete but useless demonstration and one incomplete but > > potentially useful example (with memory mapping of a fixed-format ASCII > > file). The idea is to make the file appear like a matrix or data frame but > > not have to read it into the R heap. > > > > A description and code can be found at > > http://www.biostat.washington.edu/~thomas/Rdb.html > > Rdb.nw (noweb literate program) > > Rdb.c > > Rdb.R > > > > Comments? This looks very interesting. It would be nice if the such an interface were written in a way that could be customized to a variety of applications. Being able to read spreadsheets is one thing which comes to mind. It might be nice (for example) to have a rather complex initialization procedure which inspects the dataset thoroughly and determines things like variable types (if the database does not contain this information). Egon Schmid writes: > Well, there is a web interface through the Apache module PHP Hypertext > Preprocessor. At http://www.php.net/ there are plenty more database > interfaces. > > Personaly I think it would a great idea to interface large datasets with > netCDF > > "Why not use an existing database management system for storing > array-oriented data? Relational database software is not suitable for > the kinds of data access supported by the netCDF interface. > Hmm. Over the past week I have been looking at NetCDF because GMT (The Generic Mapping Tools) http://www.soest.hawaii.edu/wessel/gmt.html use NetCDF to store their maps. [ The maps are rather better than the Becker and Wilks ones because they are based on the World Vector Shoreline as well as the CIA WDB that B&W use. They also have the maps prepared pretty well for plotting. The only place where B&W are better is in the naming of places ... ] I wasn't thinking about pulling the data from these maps into R, but rather just rendering them on a graphics device so that they could then be added to. I suspect that when I've done that I'll probably know enough to create an R/NetCDF link, perhaps using a framework of the type Thomas proposes. Ross -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._