Skip to content
Prev 61293 / 63424 Next

Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF

On 1/30/23 23:01, Henrik Bengtsson wrote:
This discussion comes from Python: https://bugs.python.org/issue4006
(it says Python skips such environment variables)

The problem of invalid strings in environment variables is a similar to 
the problem of invalid strings in file names. Both variables and file 
names are something people want to use as strings in programs, scripts, 
texts, but at the same time these may in theory not be valid strings. 
Working with (potentially) invalid strings (almost) transparently is 
much harder than with valid strings; even if R decided to do that, it 
would be hard to implement and take long and only work for some 
operations, most will still throw errors. In addition, in practice 
invalid strings are almost always due to an error, particularly so in 
file names or environment variables. Such errors are often worth 
catching (wrong encoding declaration, etc), even though perhaps not always.

In practice, this instance can only be properly fixed at the source, [1] 
should not do this. split_command() will run into problems with 
different software, not just R.

There should be a way to split the commands in ASCII (using some sort of 
quoting/escaping). Using \xFF is flawed also simply because it may be 
present in the commands, if we followed the same logic of that every 
byte is fine. So the code is buggy even regardless of multi-byte encodings.

Re difficulty to debug, I think the error message is clear and if 
packages catch and hide errors, that'd be bad design of such packages, R 
couldn't really do much about that. This needs to be fixed at [1].

Tomas

Thread (13 messages)

Henrik Bengtsson Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 30 Tomas Kalibera Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 30 Simon Urbanek Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 30 Henrik Bengtsson Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 30 Simon Urbanek Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 30 Ivan Krylov Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Tomas Kalibera Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Tomas Kalibera Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Martin Maechler Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Duncan Murdoch Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Tomas Kalibera Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Peter Dalgaard Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31 Tomas Kalibera Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '<ff>' if an environment variable contains \xFF Jan 31