String encoding problem
On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 07/07/2016 10:57 AM, Hadley Wickham wrote:
If you print: "\xc9\x82\xbf" you get "\u0242\xbf" But if you try and evaluate that string you get:
"\u0242\xbf"
Error: mixing Unicode and octal/hex escapes in a string is not allowed (Probably will only happen on mac/linux with default utf-8 encoding)
I'm not sure what should happen here, but that's not a legal string in a UTF-8 locale, so it's not too surprising that things go wonky.
Here's bit more context on how I got that sequence of bytes: x <- "?????" y <- iconv(x, to = "Shift-JIS") Encoding(y) y I did this to create an example to demonstrate how to handle encoding problems, and it's bit frustrating that I have to manually mangle the string in order to be able to re-use it in another session. Maybe strings with unknown encoding shouldn't use unicode escapes? Hadley