Rd files with unknown encoding? - R-help

Wed, Dec 12, 2007 8:07 PM #

How can I identify the problem generating a warning in R CMD check 
for "Rd files with unknown encoding"? 

      Google identified an email from John Fox with a reply from Brian 
Ripley about this last 12 Jun 2007.  This suggests that I may have 
accidentally entered some possibly non-printing character into the 
offending Rd file.  The message tells me which file, but I don't know 
which lines in the file.  Is there some way of finding the offending 
character(s) without laboriously running R CMD check after deleting 
different portions of the file until I isolate the problem? 

      Thanks,
      Spencer Graves

Brian Ripley

Thu, Dec 13, 2007 2:16 AM #

On Wed, 12 Dec 2007, Spencer Graves wrote:

But not on this list:

https://stat.ethz.ch/pipermail/r-devel/2007-June/046055.html

R-devel would have been more appropriate for this too.

I did say so in that thread:

https://stat.ethz.ch/pipermail/r-devel/2007-June/046061.html

You can do much the same in R via iconv("", "C", sub="byte"), provided you 
can read the file in (it may not be representable in your current 
locale, but you could run R in a Latin-1 locale, if your OS has one).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Spencer Graves

Sat, Dec 15, 2007 5:29 PM #

Dear Prof. Ripley: 

      Thanks very much.  I did as you suggested, which I'll outline here 
to make it easier for anyone else who might have a similar problem: 

           * Read the offending *.Rd file in R using 'readLines' 

           * Applied 'iconv' to the character vector, following the last 
example in the help file.  This translated all offending characters into 
a multi-character sequence starting with '<'. 

           * Used 'regexpr' to find all occurrences of '<'. 

      The latter identified other uses of '<' but produced a 
sufficiently short list that I was able to find the problems fairly 
easily. 

      Thanks again.
      Spencer Graves   
p.s.  And in the future, I will refer 'Rd' questions to 'R-devel', per 
your suggestion.

Prof Brian Ripley wrote:

Brian Ripley

Mon, Dec 17, 2007 2:08 AM #

Here's a slightly cleaner version:

showNonASCII <- function(x)
{
     ind <- is.na(iconv(x, "latin1", "ASCII"))
     xxx <- iconv(x[ind], "latin1", "ASCII", sub="byte")
     if(any(ind)) cat(which(ind), ": ", xxx, "\n", sep="")
}

used as

On Sat, 15 Dec 2007, Spencer Graves wrote:

Dear Prof. Ripley:
    Thanks very much.  I did as you suggested, which I'll outline here to 
make it easier for anyone else who might have a similar problem:
         * Read the offending *.Rd file in R using 'readLines'
         * Applied 'iconv' to the character vector, following the last 
example in the help file.  This translated all offending characters into a 
multi-character sequence starting with '<'.
         * Used 'regexpr' to find all occurrences of '<'.
    The latter identified other uses of '<' but produced a sufficiently 
short list that I was able to find the problems fairly easily.
    Thanks again.
    Spencer Graves   p.s.  And in the future, I will refer 'Rd' questions to 
'R-devel', per your suggestion. 
Prof Brian Ripley wrote:

On Wed, 12 Dec 2007, Spencer Graves wrote:

     How can I identify the problem generating a warning in R CMD check
for "Rd files with unknown encoding"?

     Google identified an email from John Fox with a reply from Brian
Ripley about this last 12 Jun 2007.

But not on this list:

https://stat.ethz.ch/pipermail/r-devel/2007-June/046055.html

R-devel would have been more appropriate for this too.

 This suggests that I may have accidentally entered some possibly 
non-printing character into the offending Rd file.  The message tells me 
which file, but I don't know which lines in the file.  Is there some way 
of finding the offending character(s) without laboriously running R CMD 
check after deleting different portions of the file until I isolate the 
problem?

I did say so in that thread:

https://stat.ethz.ch/pipermail/r-devel/2007-June/046061.html

You can do much the same in R via iconv("", "C", sub="byte"), provided you 
can read the file in (it may not be representable in your current locale, 
but you could run R in a Latin-1 locale, if your OS has one).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595