Skip to content
Prev 60412 / 63424 Next

R on Windows with UCRT and the system encoding

Hi Tomas,

Thank you very much for the detailed explanation! I think now I have a
bit better understanding on how the things work; at least now I know I
didn't understand the concept of "active code page". I'll follow your
advice when I need to fix the packages that need some tweaks to handle
UTF-8 properly.

Sorry, I'd like to ask one more question related to locale. If I copy
the following text and execute `read.csv("clipboard")`, it returns
"uao" instead of "???" (the characters are transliterated).

    "col1","col2"
    "???","???"


While this is probably the status quo (the same behavior on R 4.1) on
Latin-1 encoding, things are worse on CJK locales. If I try,

    "col1","col2"
    "?","?"

I get the following error:

    > read.csv("clipboard")
    Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  :
      invalid multibyte string at '<82><a0>'

Is this supposed to work? It seems the characters are encoded as CP932
(my system locale) but marked as UTF-8.

    > x <- utils:::readClipboard()
    > x
    [1] "\"col1\",\"col2\""         "\"\x82\xa0\",\"\x82\xa2\""
    > iconv(x, from = "CP932", to = "UTF-8")
    [1] "\"col1\",\"col2\"" "\"?\",\"?\""

I read the source code of readClipboard() in
src/library/utils/src/windows/util.c, but have no idea if there's
anything that needs to be fixed.

Best,
Yutani

2021?12?21?(?) 17:26 Tomas Kalibera <tomas.kalibera at gmail.com>: