Skip to content

R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2 messages · Tomáš Bořil, Tomas Kalibera

#
Yes, again in a script sourced by source(encoding = ...). But also by
typing it directly in R console.

Most of the time, I use RStudio as a front-end. For this experiment, I
also verified it in Rgui. In both front-ends, it behaves completely in
the same way.

An optional parameter to source() function which would translate all
UTF-8 characters in string literals to their "\Uxxxx" codes sounds as
a great idea (and I hope it would fix 99.9% of problems I have -
because that is the way I overcome these problems nowadays) - and the
same behaviour in command line...

Tomas
On Wed, Apr 10, 2019 at 5:29 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:
#
On 4/10/19 6:13 PM, Tom?? Bo?il wrote:

            
I was not suggesting to convert to \Uxxxx in source(). Some users do it 
in their programs by hand or an external utility. Source() in principle 
could be made work similarly to eval(parse(file,encoding=)) with respect 
to encodings, via other means, we will consider that but there are many 
remaining places where the conversion happens - a trivial one is that 
currently you cannot print the result of the parse() from your example 
properly. Maybe you don't trigger such problems in your scripts in 
obvious ways, but as I said before, if you want to work reliably with 
characters not representable in current native encoding, in current or 
near version of R, use Linux or macOS.

Tomas