Hi all,
I'd like to discuss a infelicity/possible bug with gsub. Take the
following function:
f <- function(x) {
gsub("\u{A0}", " ", gsub(" ", "\u{A0}", x))
}
As you might expect, in utf-8 locales it is idempotent:
Sys.setlocale("LC_ALL", "UTF-8")
f("x y")
# [1] "x y"
But in the C locale it is not:
Sys.setlocale("LC_ALL", "C")
f("x y")
# [1] "x\302\240y"
This seems weird to me. (And caused a bug in a package because I
didn't realise some windows users have a non-utf8 locale)
I'm not sure what the correct resolution is. Should the encoding of the output of gsub be utf-8 if either the input or replacement is utf-8?