Skip to content

Installation failure in non-UTF-8 MBCS locale

2 messages · Gábor Csárdi, Tomas Kalibera

#
I am sorry, part of the output is garbled, as the email's encoding is
different, but the error is hopefully still clear.

This is Ubuntu 20.04, yesterday's R devel or R release, in the zh_CN locale.

The zh_CN.UTF-8 locale is fine, and it is a much better option, so I
am not sure if this is considered to be a bug.
?????????????'/root/R/x86_64-pc-linux-gnu-library/4.3'
(???'lib'???????)
???URL??https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/evaluate_0.17.tar.gz'
Content type 'binary/octet-stream' length 25984 bytes (25 KB)
==================================================
downloaded 25 KB

* installing *source* package 'evaluate' ...
** ?????'evaluate'????????????MD5?????
** using staged installation
Warning in parse(con, encoding = "UTF-8") :
  argument encoding="UTF-8" is ignored in MBCS locales
Error : invalid multibyte character in parser (<input>:11:32)
ERROR: installing package DESCRIPTION failed for package 'evaluate'
* removing '/root/R/x86_64-pc-linux-gnu-library/4.3/evaluate'

????????????
'/tmp/Rtmp3O0zlO/downloaded_packages'??
Warning message:
In install.packages("evaluate") : ?????????'evaluate'?????????????0

R-release produces the same error.

Dockerfile to reproduce this:

FROM ubuntu:20.04
RUN apt-get -y update && apt-get -y install curl locales
RUN curl -Ls https://github.com/r-lib/rig/releases/download/latest/rig-linux-latest.tar.gz
| tar xz -C /usr/local
RUN rig add devel
RUN rig add release
RUN locale-gen zh_CN
RUN uname -a
RUN R -q -e 'sessionInfo()'
RUN LC_ALL=zh_CN R -q -e 'install.packages("evaluate")'

G.
#
On 10/16/22 19:35, G?bor Cs?rdi wrote:
Right, one should use UTF-8 (on all platforms) as the locale encoding.

For historical reasons, one can still parse UTF-8 when R is running e.g. 
in Latin 1 locale. This is still supported as older Windows systems 
don't use UTF-8 as the native encoding, yet.

When R runs in a non-UTF-8 multi-byte locale, it cannot parse UTF-8 R 
input files. This is due to how the parser works and supporting that 
would require a major rewrite which would not be worth the effort 
(instead effort has been spent on supporting UTF-8 as the native 
encoding on Windows).

Best
Tomas