Hi all,
I would like to propose the attached function ASCIIfy() to be added to the
'tools' package.
Non-ASCII characters in character vectors can be problematic for R
packages, but sometimes they cannot be avoided. To make packages portable
and build without 'R CMD check' warnings, my solution has been to convert
problematic characters in functions and datasets to escaped ASCII, so
plot(1,main="S?o Paulo") becomes plot(1,main="S\u00e3o Paulo").
The showNonASCII() function in package:tools is helpful to identify R
source files where characters should be converted to ASCII one way or
another, but I could not find a function to actually perform the
conversion to ASCII.
I have written the function ASCIIfy() to convert character vectors to
ASCII. I imagine other R package developers might be looking for a similar
tool, and it seems to me that package:tools is the first place they would
look, where the R Core Team has provided a variety of tools for handling
non-ASCII characters in package development.
I hope the R Core Team will adopt ASCIIfy() into the 'tools' package, to
make life easier for package developers outside the English-speaking
world. I have of course no problem with them renaming or rewriting the
function in any way.
See the attached examples - all in flat ASCII that was prepared using the
function itself! The main objective, though, is to ASCIIfy functions and
datasets, not help pages.
Arni
-------------- next part --------------
ASCIIfy <- function(string, bytes=2, fallback="?")
{
bytes <- match.arg(as.character(bytes), 1:2)
convert <- function(char) # convert to ASCII, e.g. "z", "\xfe", or "\u00fe"
{
raw <- charToRaw(char)
if(length(raw)==1 && raw<=127) # 7-bit
ascii <- char
else if(length(raw)==1 && bytes==1) # 8-bit to \x00
ascii <- paste0("\\x", raw)
else if(length(raw)==1 && bytes==2) # 8-bit to \u0000
ascii <- paste0("\\u", chartr(" ","0",formatC(as.character(raw),width=4)))
else if(length(raw)==2 && bytes==1) # 16-bit to \x00, if possible
if(utf8ToInt(char) <= 255)
ascii <- paste0("\\x", format.hexmode(utf8ToInt(char)))
else {
ascii <- fallback; warning(char, " could not be converted to 1 byte")}
else if(length(raw)==2 && bytes==2) # UTF-8 to \u0000
ascii <- paste0("\\u", format.hexmode(utf8ToInt(char),width=4))
else {
ascii <- fallback
warning(char, " could not be converted to ", bytes, " byte")}
return(ascii)
}
if(length(string) > 1)
{
sapply(string, ASCIIfy, bytes=bytes, fallback=fallback, USE.NAMES=FALSE)
}
else
{
input <- unlist(strsplit(string,"")) # "c" "a" "f" "<\'e>"
output <- character(length(input)) # "" "" "" ""
for(i in seq_along(input))
output[i] <- convert(input[i]) # "c" "a" "f" "\\u00e9"
output <- paste(output, collapse="") # "caf\\u00e9"
return(output)
}
}
-------------- next part --------------
\name{ASCIIfy}
\alias{ASCIIfy}
\title{Convert Characters to ASCII}
\description{
Convert character vector to ASCII, replacing non-ASCII characters with
single-byte (\samp{\x00}) or two-byte (\samp{\u0000}) codes.
}
\usage{
ASCIIfy(x, bytes = 2, fallback = "?")
}
\arguments{
\item{x}{a character vector, possibly containing non-ASCII
characters.}
\item{bytes}{either \code{1} or \code{2}, for single-byte
(\samp{\x00}) or two-byte (\samp{\u0000}) codes.}
\item{fallback}{an output character to use, when input characters
cannot be converted.}
}
\value{
A character vector like \code{x}, except non-ASCII characters have
been replaced with \samp{\x00} or \samp{\u0000} codes.
}
\author{Arni Magnusson.}
\note{
To render single backslashes, use these or similar techniques:
\verb{
write(ASCIIfy(x), "file.txt")
cat(paste(ASCIIfy(x), collapse="\n"), "\n", sep="")}
The resulting strings are plain ASCII and can be used in R functions
and datasets to improve package portability.
}
\seealso{
\code{\link[tools]{showNonASCII}} identifies non-ASCII characters in
a character vector.
}
\examples{
cities <- c("S\u00e3o Paulo", "Reykjav\u00edk")
print(cities)
ASCIIfy(cities, 1)
ASCIIfy(cities, 2)
athens <- "\u0391\u03b8\u03ae\u03bd\u03b1"
print(athens)
ASCIIfy(athens)
}
\keyword{}
ASCIIfy() - a proposal for package:tools
5 messages · Arni Magnusson, Gregory R. Warnes, Duncan Murdoch
1 day later
Nobody else has replied to this, so I will. It's very unlikely that we would incorporate this function into base R. For one thing, the tools package is intended to be tools used by R, not by users. R doesn't need this function, so it doesn't belong in tools. (Some other functions in tools like showNonASCII have come to be used by users, but their primary purpose is for R.) Utility functions that are maintained by R Core and are useful to users belong in the utils package. But I wouldn't add ASCIIfy to that package either, because I don't want to impose its maintenance on R Core. Utility functions that are maintained by others belong in contributed packages. So I'd suggest that you add this function to some package that you maintain (perhaps a new one, containing a collection of related utility functions), or search CRAN for an appropriate package with a maintainer who is willing to take this on. Duncan Murdoch
On 15/04/2014 1:48 PM, Arni Magnusson wrote:
Hi all, I would like to propose the attached function ASCIIfy() to be added to the 'tools' package. Non-ASCII characters in character vectors can be problematic for R packages, but sometimes they cannot be avoided. To make packages portable and build without 'R CMD check' warnings, my solution has been to convert problematic characters in functions and datasets to escaped ASCII, so plot(1,main="S?o Paulo") becomes plot(1,main="S\u00e3o Paulo"). The showNonASCII() function in package:tools is helpful to identify R source files where characters should be converted to ASCII one way or another, but I could not find a function to actually perform the conversion to ASCII. I have written the function ASCIIfy() to convert character vectors to ASCII. I imagine other R package developers might be looking for a similar tool, and it seems to me that package:tools is the first place they would look, where the R Core Team has provided a variety of tools for handling non-ASCII characters in package development. I hope the R Core Team will adopt ASCIIfy() into the 'tools' package, to make life easier for package developers outside the English-speaking world. I have of course no problem with them renaming or rewriting the function in any way. See the attached examples - all in flat ASCII that was prepared using the function itself! The main objective, though, is to ASCIIfy functions and datasets, not help pages. Arni
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Arni, I?ll be glad to drop ASCIIfy into gtools. Let me know if this OK. -Greg
On Apr 17, 2014, at 9:46 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
Nobody else has replied to this, so I will. It's very unlikely that we would incorporate this function into base R. For one thing, the tools package is intended to be tools used by R, not by users. R doesn't need this function, so it doesn't belong in tools. (Some other functions in tools like showNonASCII have come to be used by users, but their primary purpose is for R.) Utility functions that are maintained by R Core and are useful to users belong in the utils package. But I wouldn't add ASCIIfy to that package either, because I don't want to impose its maintenance on R Core. Utility functions that are maintained by others belong in contributed packages. So I'd suggest that you add this function to some package that you maintain (perhaps a new one, containing a collection of related utility functions), or search CRAN for an appropriate package with a maintainer who is willing to take this on. Duncan Murdoch On 15/04/2014 1:48 PM, Arni Magnusson wrote:
Hi all, I would like to propose the attached function ASCIIfy() to be added to the 'tools' package. Non-ASCII characters in character vectors can be problematic for R packages, but sometimes they cannot be avoided. To make packages portable and build without 'R CMD check' warnings, my solution has been to convert problematic characters in functions and datasets to escaped ASCII, so plot(1,main="S?o Paulo") becomes plot(1,main="S\u00e3o Paulo"). The showNonASCII() function in package:tools is helpful to identify R source files where characters should be converted to ASCII one way or another, but I could not find a function to actually perform the conversion to ASCII. I have written the function ASCIIfy() to convert character vectors to ASCII. I imagine other R package developers might be looking for a similar tool, and it seems to me that package:tools is the first place they would look, where the R Core Team has provided a variety of tools for handling non-ASCII characters in package development. I hope the R Core Team will adopt ASCIIfy() into the 'tools' package, to make life easier for package developers outside the English-speaking world. I have of course no problem with them renaming or rewriting the function in any way. See the attached examples - all in flat ASCII that was prepared using the function itself! The main objective, though, is to ASCIIfy functions and datasets, not help pages. Arni
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 17/04/2014 12:47 PM, Gregory R. Warnes wrote:
Hi Arni, I?ll be glad to drop ASCIIfy into gtools. Let me know if this OK.
Thanks, that sounds like a great solution if Arni doesn't want his own package. Duncan Murdoch
-Greg On Apr 17, 2014, at 9:46 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
Nobody else has replied to this, so I will. It's very unlikely that we would incorporate this function into base R. For one thing, the tools package is intended to be tools used by R, not by users. R doesn't need this function, so it doesn't belong in tools. (Some other functions in tools like showNonASCII have come to be used by users, but their primary purpose is for R.) Utility functions that are maintained by R Core and are useful to users belong in the utils package. But I wouldn't add ASCIIfy to that package either, because I don't want to impose its maintenance on R Core. Utility functions that are maintained by others belong in contributed packages. So I'd suggest that you add this function to some package that you maintain (perhaps a new one, containing a collection of related utility functions), or search CRAN for an appropriate package with a maintainer who is willing to take this on. Duncan Murdoch On 15/04/2014 1:48 PM, Arni Magnusson wrote:
Hi all, I would like to propose the attached function ASCIIfy() to be added to the 'tools' package. Non-ASCII characters in character vectors can be problematic for R packages, but sometimes they cannot be avoided. To make packages portable and build without 'R CMD check' warnings, my solution has been to convert problematic characters in functions and datasets to escaped ASCII, so plot(1,main="S?o Paulo") becomes plot(1,main="S\u00e3o Paulo"). The showNonASCII() function in package:tools is helpful to identify R source files where characters should be converted to ASCII one way or another, but I could not find a function to actually perform the conversion to ASCII. I have written the function ASCIIfy() to convert character vectors to ASCII. I imagine other R package developers might be looking for a similar tool, and it seems to me that package:tools is the first place they would look, where the R Core Team has provided a variety of tools for handling non-ASCII characters in package development. I hope the R Core Team will adopt ASCIIfy() into the 'tools' package, to make life easier for package developers outside the English-speaking world. I have of course no problem with them renaming or rewriting the function in any way. See the attached examples - all in flat ASCII that was prepared using the function itself! The main objective, though, is to ASCIIfy functions and datasets, not help pages. Arni
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks Duncan, for considering ASCIIfy. I understand your reasoning. This is a recurring pattern - I propose functions for core R, and Greg catches them from freefall :) I'm delighted with ASCIIfy being hosted in gtools. The R and Rd should be ready as is. Cheers, Arni