Skip to content

R-2.14.0: read.csv2 with fileEncoding="UTF-8"

1 message · Christian Montel

#
Dear R-List, 

I'm trying to read an UTF-8-encoded text file which works fine under

#####################################################################
### CONFIG 1
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

running under Windows Server 2008.

### RESULT:
VARIABLE        LABEL ORDER_IN_PROFILE
1        A  Umlauts:???               45
2        B Umlauts:????               35
#####################################################################

The exact same command executed under R-2.14.0 (running under Windows
7) gives a different output:

#####################################################################
### CONFIG 2
R version 2.14.0 (2011-10-31)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.14.0
### RESULT:
[1] X.
<0 rows> (or 0-length row.names)
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection 'example.utf'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'example.utf'
## same results with
If I specify "encoding" instead of "fileEncoding", non-ascii-chars are
displayed fine, but apparently the "UTF-8-bytes" are not stripped:

### RESULT:
X.U.FEFF.VARIABLE        LABEL ORDER_IN_PROFILE
1                 A  Umlauts:???               45
2                 B Umlauts:????               35
######################################################################

Any hints what I could do to reach the results from config 1 under
config 2?

Many thanks in advance, 
Christian