How can I find nonstandard or control characters in a large file?

Enrico Schumann · 2013-12-10T07:11:21Z

On Mon, 09 Dec 2013, andrewH writes: > I have a humongous csv file containing census data, far too big to read into > RAM. I have been trying to extract individual columns from this file using > the colbycol package. This works for certain subsets of the columns, but not > for others. I have not yet been able to precisely identify the problem > columns, as there are 731 columns and running colbycol on the file on my old > slow machine takes about 6 hours. > > Howeve

Enrico Schumann

Mon, Dec 9, 2013 11:11 PM

On Mon, 09 Dec 2013, andrewH <ahoerner at rprogress.org> writes:

You could process your file in chunks:

  f <- file("myfile.csv", open = "r")
  lines <- readLines(f, n = 10000)
  ## do something with lines
  lines <- readLines(f, n = 10000)
  ## do something with lines
  ## ....

To find 'non-standard characters' you will need to define what
'non-standard characters' are.  But perhaps ?tools:::showNonASCII, which
uses ?iconv, can help you.  (Please note the warnings and caveats on the
functions' help pages.)

Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

Thread (5 messages)

Andrew Hoerner How can I find nonstandard or control characters in a large file? Dec 9 Enrico Schumann How can I find nonstandard or control characters in a large file? Dec 9 Earl F. Glynn How can I find nonstandard or control characters in a large file? Dec 10 Andrew Hoerner How can I find nonstandard or control characters in a large file? Dec 15 Andrew Hoerner How can I find nonstandard or control characters in a large file? Dec 15