Performing Analysis on Subset of External data
1) Use the skip= and nrows= arguments to read.table. 2) Open a connection, read and discard rows, read the block you want then close the connection. (Which is how 1 works, essentially.) 3) Use perl, awk or some such to extract the rows you want -- this is probably rather faster.
On Wed, 6 Oct 2004, Laura Quinn wrote:
I want to perform some analysis on subsets of huge data files. There are 20 of the files and I want to select the same subsets of each one (each subset is a chunk of 1500 or so consecutive rows from several million). To save time and processing power is there a method to tell R to *only* read in these rows, rather than reading in the entire dataset then selecting subsets and deleting the extraneous data? This method takes a rather silly amount of time and results in memory problems. I am using R 1.9.0 on SuSe 9.0
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595