Back to formatted view
Raw Message

Message-ID: <CABtg=K=_w9rqqua5V-dqB7w7wiQpXaMub7MknNpzbU+rP_8fPQ@mail.gmail.com>
Date: 2022-02-21T10:33:30Z
From: Gábor Csárdi
Subject: deparse() and UTF-8 strings

I am wondering if it would make sense to produce \u escaped strings in
deparse() for UTF-8 input. Currently we have (in R-devel):

x <- "G\u00e1bor"
Sys.setlocale("LC_ALL", "C")
#> [1] "C/C/C/C/C/en_US.UTF-8"

deparse(x)
#> [1] "\"G<U+00E1>bor\""

charToRaw(deparse(x))
#> [1] 22 47 3c 55 2b 30 30 45 31 3e 62 6f 72 22

Is there a reason why this is preferable instead of returning

"\"G\\u00e1bor\""

i.e.

charToRaw("\"G\\u00e1bor\"")
#>  [1] 22 47 5c 75 30 30 65 31 62 6f 72 22

Returning the \u escaped form would make deparse() the inverse of
parse(), at least in this respect.

Thank you,
Gabor