Find out what "native.enc" corresponds to - R-help

Sun, Aug 5, 2012 1:54 AM #

Hi!

I'm using R2HTML in my RcmdrPlugin.temis package to output localized
strings to a HTML file. Thus, I insert a simple header at the top of the
file to specify what encoding is used; if I don't do that, Web browsers
assume it is latin1, which is not always true.

My problem is, I could not find a way to detect what encoding is used by
R2HTML in the most general case. R2HTML simply calls cat() with the file
name, which means the text connection is opened using file(encoding =
getOption("encoding")). This is fine, except that when
getOption("encoding")) is set to "native.enc", I'm not able to find out
the real encoding that was used for output.

Of course, ideally I would tell R2HTML to output everything as UTF-8,
and I would add this information to the header. But AFAICT this is not
possible in the current state of this package. So I would be very
grateful if somebody could provide me with a solution to resolve
"native.enc" to the encoding name.

Thanks for your help

Brian Ripley

Sun, Aug 5, 2012 2:04 AM #

On 05/08/2012 09:54, Milan Bouchet-Valat wrote:

?options points you to ?connections, which does explain this.  See 
Sys.getlocale("LC_CTYPE") to see

'the internal encoding of the current locale'

(or at least, what the OS claims it to be: e.g. some lie about 'C' locales).

As for a name, iconv() knows this as "" (and some OSes do make it rather 
hard to find a name if it is not part of the locale name).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Milan Bouchet-Valat

Sun, Aug 5, 2012 12:55 PM #

Le dimanche 05 ao?t 2012 ? 10:04 +0100, Prof Brian Ripley a ?crit :

Thanks for the pointers, but the issue is/was that LC_CTYPE does not
provide a valid encoding name. But your reply prompted me to read ?iconv
again, and I discovered the existence of localeToCharset(), which seems
to provide me with the encoding name I'm looking for.

I'm afraid I don't understand what you mean. Do you suggest I encode
data to/from the current encoding?


Regards