Enormous Datasets
Very unlikely R will be able to handle this. The problems are: * the data set may simply not fit into the memory * it will take forever to read from the ASCII file * any meaningful analysis of a dataset in R typically require 5 - 10 times more memory than the size of the dataset (unless you are a real insider and know all the knobs) Your best strategy is probably to partition the file in meaningful sub-categories and work with them. To save time on conversion from ASCII you can read the sub-files into a data frame and then save the data frame in .rda file using save(). Subsequent loading .rda files is much faster than reading ASCII Another strategy which is often advocated on the list is to put the data into a data base and draw random samples of manageable size from the database. I have no experience with this approach HTH, Vadim
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Thomas W Volscho Sent: Thursday, November 18, 2004 12:11 PM To: r-help at stat.math.ethz.ch Subject: [R] Enormous Datasets Dear List, I have some projects where I use enormous datasets. For instance, the 5% PUMS microdata from the Census Bureau. After deleting cases I may have a dataset with 7 million+ rows and 50+ columns. Will R handle a datafile of this size? If so, how? Thank you in advance, Tom Volscho ************************************ Thomas W. Volscho Graduate Student Dept. of Sociology U-2068 University of Connecticut Storrs, CT 06269 Phone: (860) 486-3882 http://vm.uconn.edu/~twv00001
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html