R for large data
On Wed, 11 Jul 2001, Micheall Taylor wrote:
I am trying to gain an understanding of R's capabilities in larger data set analysis. I really like R, but the datasets that I normally work with are in the 15m-50m range, sometimes much larger. The size owes to observations, not extraneous variables, so little can be done to "clean" the data of unnecessary elements. (i.e. database storage or external data manipulation doesn't get me very far) Over the past couple of years I've used Stata (prior to that SAS, etc). I have 2 gigs of memory, but R seems pretty slow to load relatively modest datasets of say 10-30 megs. Much slower to load than say Stata. For comparison, stats on loading a 32 meg datafile: R - 5.3 minutes Stata - 31 secs SPSS - 42 secs SAS - 21 secs
Loading from what? I find R images of that size load in a few seconds, hardly noticeable compared to starting R (or doing anything with them). Or do you mean reading in from a text file (by read.table, say)?
I normally start R with the command line switch allowing it to use 600megs or so - stata is allocated 200 megs. I've allocated 1.5 gigs to stata
Um. Current versions need no switches for modest data sets like 10-30Mb.
before so I assume my memory management isn't an issue. Does anyone have any pointers to documents which discuss R limitations? Could there be something wrong with my particular R installation (RH 7.1 and most recent stable R release., 2 gigs memory, enterprise kernel, dual processor 800 mghrtz, high performance scsi drives)
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._