Skip to content

Parsing and deparsing of escaped unicode characters

1 message · Yihui Xie

#
The behavior depends on the specific locale. When these characters are
deparsed in a Chinese locale, they work fine, but in an English
locale, they will get escaped:
[1] "I like ??"
[1] "\"I like ??\""
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
[1] "I like ??"
[1] "\"I like <U+5BFF><U+53F8>\""

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Web: http://yihui.name
On Mon, Jul 28, 2014 at 4:47 AM, Jeroen Ooms <jeroenooms at gmail.com> wrote: