[R-pkg-devel] Warning... unable to translate 'Ekstr<f8>m' to a wide string; Error... input string 1 is invalid

Tue, Jul 19, 2022 11:56 AM

On Tue, 19 Jul 2022 13:23:11 -0500

Spencer Graves <spencer.graves at effectivedefense.org> wrote:

Is subNonStandardCharacters() supposed to work with strings with
Encoding(.) == 'unknown' that are also invalid in current locale
encoding? (I think it's fair to not support Encoding(.) == 'bytes' for
such a function, because such strings aren't supposed to be text.)

If yes, the function itself needs to be fixed. I think that
useBytes=TRUE may help, as long as the standardCharacters argument is
limited to characters representable in ASCII. Alternatively, find a way
to transform the 'x' argument into something that is guaranteed to be
valid in its declared encoding. enc2utf8() could be an option, but any
invalid bytes are replaced by their <hexadecimal codes>, which defeats
the purpose of subNonStandardCharacters(). Find a way to feed the
output of Encoding(x) to iconv() as its "from" argument?

If not, it's enough to fix the example.

This is described in ?Quotes, although admittedly harder to find than
desired. The "\u" escape sequences take 1 to 4 hexadecimal digits. As
long as your escape sequence isn't followed by something that looks
like a hexadecimal digit, you can keep it short, like "\uf8m" (m is not
a hex digit). If you want to be 100% unambiguous, either padding the
code point number to 4 digits ("\u00f8m") or wrapping it into braces
("\u{f8}m") is enough. The belt-and-bracers approach ("\u{00f8}m") is
not an error, either.

You can also use the Encoding(x) <- 'latin1' trick to mark the strings
produced from bytes as Latin-1. Then gsub() will work normally, the
same way things happily work in example(iconv).

Best regards,
Ivan

[R-pkg-devel] Warning... unable to translate 'Ekstr<f8>m' to a wide string; Error... input string 1 is invalid

Thread (14 messages)