Skip to content
Back to formatted view

Raw Message

Message-ID: <509B5B06.4050801@gmail.com>
Date: 2012-11-08T07:11:02Z
From: Lee Hachadoorian
Subject: Help Read File With Odd Characters

I have a large (105MB) data file, tab-delimited with a header. There are 
some odd characters at the beginning of the file that are preventing it 
from being read by R.

 > dfTemp = read.delim(filename)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>m'

When I view the file with head, I see:

??muni_code parcel_id?

The file is too large to edit in a graphical text editor (gedit). I 
tried just dropping the header row with

sed '1 d' <old.txt >new.txt"

but then

 > dfTemp = read.delim(filename)
Error in read.table(file = file, header = header, sep = sep, quote = 
quote, :
empty beginning of file

I tried some other shenanigans with sed (with which I am not really 
experienced) but did not get a usable file. Does anyone have any ideas 
for how to (a) directly read this into R, skipping the offending line or 
characters, or (b) preprocess it so that I can read it into R?

Best,
--Lee

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Linux Mint 13

-- 
Lee Hachadoorian
Assistant Professor in Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu