Hi!
I'm using R2HTML in my RcmdrPlugin.temis package to output localized
strings to a HTML file. Thus, I insert a simple header at the top of the
file to specify what encoding is used; if I don't do that, Web browsers
assume it is latin1, which is not always true.
My problem is, I could not find a way to detect what encoding is used by
R2HTML in the most general case. R2HTML simply calls cat() with the file
name, which means the text connection is opened using file(encoding =
getOption("encoding")). This is fine, except that when
getOption("encoding")) is set to "native.enc", I'm not able to find out
the real encoding that was used for output.
Of course, ideally I would tell R2HTML to output everything as UTF-8,
and I would add this information to the header. But AFAICT this is not
possible in the current state of this package. So I would be very
grateful if somebody could provide me with a solution to resolve
"native.enc" to the encoding name.
Thanks for your help
Find out what "native.enc" corresponds to
3 messages · Brian Ripley, Milan Bouchet-Valat
On 05/08/2012 09:54, Milan Bouchet-Valat wrote:
Hi!
I'm using R2HTML in my RcmdrPlugin.temis package to output localized
strings to a HTML file. Thus, I insert a simple header at the top of the
file to specify what encoding is used; if I don't do that, Web browsers
assume it is latin1, which is not always true.
My problem is, I could not find a way to detect what encoding is used by
R2HTML in the most general case. R2HTML simply calls cat() with the file
name, which means the text connection is opened using file(encoding =
getOption("encoding")). This is fine, except that when
getOption("encoding")) is set to "native.enc", I'm not able to find out
the real encoding that was used for output.
Of course, ideally I would tell R2HTML to output everything as UTF-8,
and I would add this information to the header. But AFAICT this is not
possible in the current state of this package. So I would be very
grateful if somebody could provide me with a solution to resolve
"native.enc" to the encoding name.
?options points you to ?connections, which does explain this. See
Sys.getlocale("LC_CTYPE") to see
'the internal encoding of the current locale'
(or at least, what the OS claims it to be: e.g. some lie about 'C' locales).
As for a name, iconv() knows this as "" (and some OSes do make it rather
hard to find a name if it is not part of the locale name).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Le dimanche 05 ao?t 2012 ? 10:04 +0100, Prof Brian Ripley a ?crit :
On 05/08/2012 09:54, Milan Bouchet-Valat wrote:
Hi!
I'm using R2HTML in my RcmdrPlugin.temis package to output localized
strings to a HTML file. Thus, I insert a simple header at the top of the
file to specify what encoding is used; if I don't do that, Web browsers
assume it is latin1, which is not always true.
My problem is, I could not find a way to detect what encoding is used by
R2HTML in the most general case. R2HTML simply calls cat() with the file
name, which means the text connection is opened using file(encoding =
getOption("encoding")). This is fine, except that when
getOption("encoding")) is set to "native.enc", I'm not able to find out
the real encoding that was used for output.
Of course, ideally I would tell R2HTML to output everything as UTF-8,
and I would add this information to the header. But AFAICT this is not
possible in the current state of this package. So I would be very
grateful if somebody could provide me with a solution to resolve
"native.enc" to the encoding name.
?options points you to ?connections, which does explain this. See
Sys.getlocale("LC_CTYPE") to see
'the internal encoding of the current locale'
(or at least, what the OS claims it to be: e.g. some lie about 'C' locales).
Thanks for the pointers, but the issue is/was that LC_CTYPE does not provide a valid encoding name. But your reply prompted me to read ?iconv again, and I discovered the existence of localeToCharset(), which seems to provide me with the encoding name I'm looking for.
As for a name, iconv() knows this as "" (and some OSes do make it rather hard to find a name if it is not part of the locale name).
I'm afraid I don't understand what you mean. Do you suggest I encode data to/from the current encoding? Regards