I think this can be made to work by telling tools:::process_Rd() ->
tools:::processRdChunk() to parse character strings in R code as UTF-8:
Index: src/library/tools/R/RdConv2.R
===================================================================
--- src/library/tools/R/RdConv2.R (revision 88617)
+++ src/library/tools/R/RdConv2.R (working copy)
@@ -229,8 +229,8 @@
code <- structure(code[tags != "COMMENT"],
srcref = codesrcref) # retain for error locations
chunkexps <- tryCatch(
- parse(text = sub("\n$", "", as.character(code)),
- keep.source = options$keep.source),
+ parse(text = sub("\n$", "", enc2utf8(as.character(code))),
+ keep.source = options$keep.source, encoding = "UTF-8"),
error = function (e) stopRd(code, Rdfile, conditionMessage(e))
)
That enc2utf8() may be extraneous, since tools::parse_Rd() is
documented to convert text to UTF-8 while parsing. The downsides are,
of course, parse(encoding=...) not working with MBCS locales and the
ever-present danger of breaking some user code that depends on the
current behaviour (this was tested using 'make check-devel', not on
CRAN packages).
Should R compile under LC_ALL=C? Maybe it's time for people whose
builds are failing to switch the continuous integration containers from
C to C.UTF-8?