I have a large (105MB) data file, tab-delimited with a header. There are
some odd characters at the beginning of the file that are preventing it
from being read by R.
> dfTemp = read.delim(filename)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>m'
When I view the file with head, I see:
??muni_code parcel_id?
The file is too large to edit in a graphical text editor (gedit). I
tried just dropping the header row with
sed '1 d' <old.txt >new.txt"
but then
> dfTemp = read.delim(filename)
Error in read.table(file = file, header = header, sep = sep, quote =
quote, :
empty beginning of file
I tried some other shenanigans with sed (with which I am not really
experienced) but did not get a usable file. Does anyone have any ideas
for how to (a) directly read this into R, skipping the offending line or
characters, or (b) preprocess it so that I can read it into R?