Hi, I like to comment my programs, and often I do so in French. But accented vowels in a R editor window get screwed up in the R console, so that # exemple ˆ suivre becomes > # exemple àsuivre (other french accents become just as ugly). I suppose this is a font encoding error. Can it be fixed, or is there something in R itself which prevents it from even displaying such characters? Sincerely, Denis Chabot
font encoding issue
3 messages · Denis Chabot, Martin Maechler, Simon Urbanek
"Denis" == Denis Chabot <chabotd@globetrotter.net>
on Tue, 23 Nov 2004 21:17:58 +0100 writes:
Denis> Hi, I like to comment my programs, and often I do so
Denis> in French. But accented vowels in a R editor window
Denis> get screwed up in the R console, so that
Denis> # exemple ? suivre
Denis> becomes
>> # exemple ?? suivre
(and this has been changed again by passing through the mail systems)
Denis> (other french accents become just as ugly).
Denis> I suppose this is a font encoding error. Can it be
Denis> fixed, or is there something in R itself which
Denis> prevents it from even displaying such characters?
No, it's not "R itself", since this works in quite a few other
circumstances in other R consoles: You should be able to use
accents even in strings and plot them, see
> example(text)
in R, and even in R object names :
E.g. (in a Linux console):
> Z?ri <- "exemple ? suivre" # ? suivre
> Z?ri
[1] "exemple ? suivre"
>
---------
However it is -- as you write -- an encoding issue,
and probably also depending quite a bit on your so
called "Locale" settings. To learn more, e.g., see in R
> apropos("locale")
[1] "Sys.getlocale" "Sys.localeconv" "Sys.setlocale"
and now, e.g.,
> ?Sys.getlocale
Here (where I mainly work with Redhat Enterprise Linux) I've
explicitly turned off the Unicode locale (UTF-8 to be specific)
and reverted to "C" (or "POSIX"), by at least setting 'LANG=C'
or 'LANG=POSIX' instead of something like 'LANG=en_US.UTF-8'.
Look at what
> Sys.getenv("LANG")
tells you, and consider
> Sys.putenv(LANG = "C")
Martin Maechler, ETH Zurich
On Nov 23, 2004, at 9:17 PM, Denis Chabot wrote:
I suppose this is a font encoding error. Can it be fixed, or is there something in R itself which prevents it from even displaying such characters?
It's a bug and a feature of the R GUI ;). Internally, R GUI uses UTF-8 encoding for text handing, including the editor. The idea was to have a localized GUI with support for any language and UTF-8 is the natively supported format in Cocoa. To make the mess even bigger, there was a bug in the GUI that converted the UTF-8 to vanilla C string at one point, thus resulting in the wrong behavior you spotted. Now I have fixed that latter bug, such that your comments should appear undistorted now: > # exemple ? suivre If this is all you need, get tonight's nightly build. However, using UTF-8 in strings in R is not that easy. Even if all you want is to retain the UTF-8 contents (i.e. tell R to not worry about the encoding and just print back what it gets), the actual problem is that R escapes certain characters, regardless of the locale: > Sys.getlocale() [1] "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/C" > "M?ll" [1] "M\303\274ll" This means that the don't-worry-concept doesn't work. The latest info on encodings and UTF-8 I could find was for 1.8.1, but I suspect that nothing changed since: basically R has no UTF-8 support and there will be none unless someone with enough time, energy and skill will take up the task. The bottom line is that I'll try to fix the GUI in a sense that it will use the locale-specific encoding in its internal representation and for all communication with R. The drawback will be that users on systems with different locales won't be able to use each other's files transparently. Still, this should fix things for users of more simple encodings (such as Latin1), but for more general support of UTF-8 or other multi-character encodings we will have to wait until there is a global solution in R. Cheers, Simon