Skip to content
Prev 29930 / 63462 Next

Problem with UTF-8 text in the Rcmdr package

Unless Windows is running in CP1250 (the Slovenian encoding on Windows), 
this is not expected to work.  I believe John tested in CP1252, and it 
just so happens that those characters are in the same place in CP1250 and 
CP1252.

I get something different in CP1250, as pasting into the script window 
also does not work.  But if I use the Unicode escapes, the result in the 
output Window is rendered correctly in the output window.

I think Jaro has put his finger on this: Tcl/Tk output thinks it is in 
Latin-2 and not CP1250, and s and z caron have different positions in 
those two character sets.  Here is something I can reproduce easily: with 
XP set to Slovenian:
[1] "??????"
[1] c8 8a 8e e8 9a 9e

which is correct for CP1250.  Now if I submit 'x' in the Rcmdr script 
window, I get the wrong output in the output window.

And I've tracked that down to a bug in iconv (something we take from 
libiconv on Windows): it does think the native encoding is Latin-2, not 
CP1252.  I'll put a workaround in R-devel and R-patched shortly.  That has 
other potential ramifications that will take me longer to investigate, and 
correct thing may be to fix iconv.
On Sun, 7 Sep 2008, John Fox wrote: