read in large data file (tsv) with inline filter?

Dylan Beaudette · 2009-03-23T22:09:36Z

On Monday 23 March 2009, David Reiss wrote: > I have a very large tab-delimited file, too big to store in memory via > readLines() or read.delim(). Turns out I only need a few hundred of those > lines to be read in. If it were not so large, I could read the entire file > in and "grep" the lines I need. For such a large file; many calls to > read.delim() with incrementing "skip" and "nrows" parameters, followed by > grep() calls is very slow. I am aware of possibilities via SQLite; I would > pref

Dylan Beaudette

Mon, Mar 23, 2009 3:09 PM

On Monday 23 March 2009, David Reiss wrote:

How about pre-filtering before loading the data into R:

grep -E 'your pattern here' your_file_here > your_filtered_file

alternatively if you need to search in fields, see 'awk', and 'cut', or if you 
need to delete things see 'tr'.

These tools come with any unix-like OS, and you can probably get them on 
windows without much effort.


Cheers,
Dylan

Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

read in large data file (tsv) with inline filter?

Thread (6 messages)