Message-ID: <0169C15A-4C36-400B-B3A3-E3EBC156D9BF@ucsd.edu>
Date: 2011-06-27T22:22:09Z
From: Roger Levy
Subject: new read.table() error message in R 2.13.0 reading UTF-8 input
Hi,
Consistent with recent reports (http://tolstoy.newcastle.edu.au/R/e14/devel/11/06/0434.html) I have found what I believe is an error new to R 2.13.0 when reading a UTF-8 encoded file. Minimal example:
=========== filename: test.txt =================
1 2 3 ? 4
================================================
[note that the fourth column is a Cyrillic character; file is saved in UTF-8.]
> read.table("/tmp/test")
V1 V2 V3 V4 V5
1 1 2 3 \320\257 4
> read.table("/tmp/test",fileEncoding="UTF-8")
V1 V2 V3
1 1 2 3
Warning messages:
1: In read.table("/tmp/test", fileEncoding = "UTF-8") :
invalid input found on input connection '/tmp/test'
2: In read.table("/tmp/test", fileEncoding = "UTF-8") :
incomplete final line found by readTableHeader on '/tmp/test'
I'm using OS X 10.6.7 with the pre-packaged R binaries from CRAN.
Best
Roger
--
Roger Levy Email: rlevy at ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://idiom.ucsd.edu/~rlevy