Very Large Data Sets
On Thu, 23 Dec 1999 kmself at ix.netcom.com wrote:
When dealing with large datasets outside of SAS, my suggestion would be to look to tools such as Perl and MySQL to handle the procedural and relational processing of data, using R as an analytic tool. Most simple statistics (subsetting, aggregation, drilldown) can be accommodated through these sorts of tools. Think of the relationship to R as the division as between the DATA step and SAS/STAT or SAS/GRAPH. I would be interested to know of any data cube tools which are freely available or available as free software.
The S-PLUS package for the netCDF format, written by Steve Oncley of NCAR, allows reading of arbitrary "slabs" of a very large data file. At one point he was planning to write an R version, but I can't remember what happened and my email records for the relevant time were eaten by a Microsoft Outlook/Pine disagreement. This would allow you to work with large data files one piece at a time (if they were netCDF files). Something similar could be done with mmap(2) if your OS allows addressing that much memory (which they mostly will soon). Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._