Skip to content
Prev 198968 / 398506 Next

Scanning grep through huge files

On 11/3/2009 9:29 AM, Johannes Graumann wrote:
I think you are going to have to write this yourself.  R doesn't have 
very many stream oriented functions:  almost everything is aimed at 
having the whole thing in memory.

You will also have trouble with the byte offsets.  The semantics of the 
-u option to grep are quite strange (at least according to the man page 
on Cygwin).

What I'd do given your problem is use readLines to read the file, then 
post-process the result of gregexpr to give line and byte offset pairs 
for each match; those are more useful in R than the rather bizarre "byte 
offsets" that grep -buo will give.  But for a huge file you'll probably 
have to do this in blocks, as the whole file may be too big.

Duncan Murdoch