I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
I'm on Ubuntu (latest), R-2.3.1
Thanks,
G?ran
UTF-8 and .Rd files
15 messages · Göran Broström, Hin-Tak Leung, Peter Dalgaard +4 more
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
I'm on Ubuntu (latest), R-2.3.1
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 6/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
Thanks for your answer! So this means that 'latin1' does not cause
problems for non-European locales and Windows users, I take it.
I really only need non-ascii to write the name ot the author (me)
correctly. I tried LaTeX code ({\"o}), but that didn't work. Is there
a way around this?
G?ran
I'm on Ubuntu (latest), R-2.3.1
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
G?ran Brostr?m
G?ran Brostr?m wrote:
On 6/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
Thanks for your answer! So this means that 'latin1' does not cause
problems for non-European locales and Windows users, I take it.
I really only need non-ascii to write the name ot the author (me)
correctly. I tried LaTeX code ({\"o}), but that didn't work. Is there
a way around this?
G?ran
The \"o character in my latin1 (iso 8859-1) man page says it is 0xF6 F6 - LATIN SMALL LETTER O WITH DIAERESIS The capital version is D6 - LATIN CAPITAL LETTER O WITH DIAERESIS in html I think you need to do &#F6; or something for that character to appear? HTH HTL
Hello, G?ran: Have you considered the German solution: "Goeran"? (e.g., Wuertz for W?rtz)? Be thankful that you aren't Russian or Greek or Arabic or Chinese, etc., for which there may be no standard transliteration into the Latin alphabet. Sorry I can't be more helpful. Spencer Graves p.s. When I'm with native Spanish speakers who don't know English, I pronounce my name very differently, like "Espencer Gra-ve", to match how they would pronounce my name when they see it written. Similarly, I once heard a French Canadian take about his young son, Guillaume. If you ask him in English, "What's your name?" he replies, "Bill". If you ask the same question in French, he replies, "Guillaume".
Hin-Tak Leung wrote:
G?ran Brostr?m wrote:
On 6/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
Thanks for your answer! So this means that 'latin1' does not cause
problems for non-European locales and Windows users, I take it.
I really only need non-ascii to write the name ot the author (me)
correctly. I tried LaTeX code ({\"o}), but that didn't work. Is there
a way around this?
G?ran
The \"o character in my latin1 (iso 8859-1) man page says it is 0xF6 F6 - LATIN SMALL LETTER O WITH DIAERESIS The capital version is D6 - LATIN CAPITAL LETTER O WITH DIAERESIS in html I think you need to do &#F6; or something for that character to appear? HTH HTL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Spencer Graves wrote:
Hello, G?ran:
Have you considered the German solution: "Goeran"? (e.g., Wuertz
for W?rtz)?
Be thankful that you aren't Russian or Greek or Arabic or Chinese,
etc., for which there may be no standard transliteration into the Latin
alphabet.
Well, I have to live with that, being of one of the above mentioned
catergories. Where it is important to have my own name in native form
in documents, I keep around a png, a eps with postscript type 1
font embedded, and a pdf from the eps for the odd pdflatex occasions.
It is going to be very hack-ish, but I wonder if it is possible to
utilise the fact that latex comments (%) are not the same as html
comments (<!-- -->) and vice versa, to make things work.
I seems to recall somewhere in the R-extension manual about being about
to do \alternatives{latex stuff}{ascii stuff} for alternatives
which are destined to appear in different converted output types.
(Prof Ripley at this point would probably tell me the exact page
number and references...)
Hin-Tak
Sorry I can't be more helpful.
Spencer Graves
p.s. When I'm with native Spanish speakers who don't know English, I
pronounce my name very differently, like "Espencer Gra-ve", to match how
they would pronounce my name when they see it written. Similarly, I
once heard a French Canadian take about his young son, Guillaume. If
you ask him in English, "What's your name?" he replies, "Bill". If you
ask the same question in French, he replies, "Guillaume".
Hin-Tak Leung wrote:
G?ran Brostr?m wrote:
On 6/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
Thanks for your answer! So this means that 'latin1' does not cause
problems for non-European locales and Windows users, I take it.
I really only need non-ascii to write the name ot the author (me)
correctly. I tried LaTeX code ({\"o}), but that didn't work. Is there
a way around this?
G?ran
The \"o character in my latin1 (iso 8859-1) man page says it is 0xF6 F6 - LATIN SMALL LETTER O WITH DIAERESIS The capital version is D6 - LATIN CAPITAL LETTER O WITH DIAERESIS in html I think you need to do &#F6; or something for that character to appear? HTH HTL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
We describe how to use \enc for possible transliterations for exactly this purpose in the `Writing R Extensions' manual. In answer to G?ran's question, yes latin1 is safer than UTF-8 for HTML browsers but neither are guaranteed to contain a glyph for ? in a font used e.g. in a Russian locale.
On Tue, 27 Jun 2006, Spencer Graves wrote:
Hello, G?ran: Have you considered the German solution: "Goeran"? (e.g., Wuertz for W?rtz)? Be thankful that you aren't Russian or Greek or Arabic or Chinese, etc., for which there may be no standard transliteration into the Latin alphabet. Sorry I can't be more helpful. Spencer Graves p.s. When I'm with native Spanish speakers who don't know English, I pronounce my name very differently, like "Espencer Gra-ve", to match how they would pronounce my name when they see it written. Similarly, I once heard a French Canadian take about his young son, Guillaume. If you ask him in English, "What's your name?" he replies, "Bill". If you ask the same question in French, he replies, "Guillaume". Hin-Tak Leung wrote:
G?ran Brostr?m wrote:
On 6/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
Thanks for your answer! So this means that 'latin1' does not cause
problems for non-European locales and Windows users, I take it.
I really only need non-ascii to write the name ot the author (me)
correctly. I tried LaTeX code ({\"o}), but that didn't work. Is there
a way around this?
G?ran
The \"o character in my latin1 (iso 8859-1) man page says it is 0xF6 F6 - LATIN SMALL LETTER O WITH DIAERESIS The capital version is D6 - LATIN CAPITAL LETTER O WITH DIAERESIS in html I think you need to do &#F6; or something for that character to appear? HTH HTL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 6/27/06, Spencer Graves <spencer.graves at pdf.com> wrote:
Hello, G?ran:
Have you considered the German solution: "Goeran"? (e.g., Wuertz
for W?rtz)?
Yes, but really not; I like your p.s. solution better!
Be thankful that you aren't Russian or Greek or Arabic or Chinese,
etc., for which there may be no standard transliteration into the Latin
alphabet.
Sorry I can't be more helpful.
Spencer Graves
p.s. When I'm with native Spanish speakers who don't know English, I
pronounce my name very differently, like "Espencer Gra-ve", to match how
they would pronounce my name when they see it written. Similarly, I
once heard a French Canadian take about his young son, Guillaume. If
you ask him in English, "What's your name?" he replies, "Bill". If you
ask the same question in French, he replies, "Guillaume".
Good idea! I call myself "George" in English, "Yuri" in Russian, "Goran" on Balkan, etc. Seriously, I thoght that unicode and utf8 would make problems like this disappear, but obviously we may have to wait another 30 years. Thanks for all the input. George
Hin-Tak Leung wrote:
G?ran Brostr?m wrote:
On 6/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
On Tue, 27 Jun 2006, G?ran Brostr?m wrote:
I have been converting to utf8 from latin1, and this gives me
problems, some solved, but here is one unsolved: In my .Rd files, I
have included '\encoding{UTF-8}' at the top. Despite this, the HTML
help pages contains 'content="text/html; charset=iso-8859-1"', and my
name is mangled. What can I do about this?
Reproducible example, please! (I've just tried this and it works for me.) As described in my talk at UseR 2006, you may well not want to do this if you intend to distribute the package. Your name contains characters that are not in the fonts used in UTF-8 in non-European locales, and Windows users do no have ready access to UTF-8 viewers (even if they know the files are UTF-8).
Thanks for your answer! So this means that 'latin1' does not cause
problems for non-European locales and Windows users, I take it.
I really only need non-ascii to write the name ot the author (me)
correctly. I tried LaTeX code ({\"o}), but that didn't work. Is there
a way around this?
G?ran
The \"o character in my latin1 (iso 8859-1) man page says it is 0xF6 F6 - LATIN SMALL LETTER O WITH DIAERESIS The capital version is D6 - LATIN CAPITAL LETTER O WITH DIAERESIS in html I think you need to do &#F6; or something for that character to appear? HTH HTL
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
G?ran Brostr?m
"G?ran Brostr?m" <goran.brostrom at gmail.com> writes:
Seriously, I thoght that unicode and utf8 would make problems like this disappear, but obviously we may have to wait another 30 years. Thanks for all the input. George
Well, I do tend to think that we should just use utf, assuming that
people have the relevant glyphs. If they don't, then they might get
little hollow rectangles but so what? (This entails stamping out the
use of iso-8859-? which I think I have previously pointed out as the
historical mistake. Easier said than done, though, especially since
8859-1, er, -15 managed to get established as a de facto standard
in a couple of key places like HTTP and NNTP.)
Transliterations are really abominable and completely ambiguous, e.g.
oe means o-umlaut in Swedish and German, but o-slash in Danish and
Norwegian, and we already have at least two interpretations of "roer"
where oe represents two distinct vowels...
piotr
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
I've been following this thread hoping for the definitive answer...
Peter Dalgaard wrote:
....
Well, I do tend to think that we should just use utf, assuming that people have the relevant glyphs. If they don't, then they might get little hollow rectangles but so what?
My problem is that I put an ? in a reference in an Rd file, and now my
builds fail on some of my systems. I can switch which systems work and
which are broken, but I can not get it to work on all systems. I have
spent way too much time trying to figure out what is wrong. So, wrt "so
what", I need to choose between checking my packages on all the
different systems I use, or having an ? in the Rd file. I think my
problem is more complicated than having the relevant glyphs. I suspect
it has to do with having the same locale on all systems doing NFS
mounts, or on my cvs server, or something strange like that.
Paul
====================================================================================
La version fran?aise suit le texte anglais.
------------------------------------------------------------------------------------
This email may contain privileged and/or confidential inform...{{dropped}}
Paul Gilbert <pgilbert at bank-banque-canada.ca> writes:
I've been following this thread hoping for the definitive answer... Peter Dalgaard wrote: ....
Well, I do tend to think that we should just use utf, assuming that people have the relevant glyphs. If they don't, then they might get little hollow rectangles but so what?
My problem is that I put an ? in a reference in an Rd file, and now my builds fail on some of my systems. I can switch which systems work and which are broken, but I can not get it to work on all systems. I have spent way too much time trying to figure out what is wrong. So, wrt "so what", I need to choose between checking my packages on all the different systems I use, or having an ? in the Rd file. I think my problem is more complicated than having the relevant glyphs. I suspect it has to do with having the same locale on all systems doing NFS mounts, or on my cvs server, or something strange like that.
Just to clarify, one thing is what I feel should be the longer term strategy, another is what the R build tools can currently do... Did you follow the advice to declare your input encoding with \encoding and use \enc to provide a transliteration?
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Hi, Paul: Earlier in this thread, G?ran Brostr?m wrote, "I really only need non-ascii to write the name of the author (me) correctly." The standard advice I got from a similar thread some time ago is to use the 'vanilla' Latin alphabet for key words, file and function names, etc., and restrict the use of other characters to documentation where the consequences of problems are not so severe. I, too, would like to see all the accents, Arabic script, Chinese characters, etc., that other people want to use. However, we must work with the world as it is, not as we would like it to be (while devoting some time where appropriate to making the world better, as everyone who contributes to the R Project does). Best Wishes, Spencer Graves
Paul Gilbert wrote:
I've been following this thread hoping for the definitive answer... Peter Dalgaard wrote: ....
Well, I do tend to think that we should just use utf, assuming that people have the relevant glyphs. If they don't, then they might get little hollow rectangles but so what?
My problem is that I put an ? in a reference in an Rd file, and now my
builds fail on some of my systems. I can switch which systems work and
which are broken, but I can not get it to work on all systems. I have
spent way too much time trying to figure out what is wrong. So, wrt "so
what", I need to choose between checking my packages on all the
different systems I use, or having an ? in the Rd file. I think my
problem is more complicated than having the relevant glyphs. I suspect
it has to do with having the same locale on all systems doing NFS
mounts, or on my cvs server, or something strange like that.
Paul
====================================================================================
La version fran?aise suit le texte anglais.
------------------------------------------------------------------------------------
This email may contain privileged and/or confidential inform...{{dropped}}
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, 28 Jun 2006, Peter Dalgaard wrote:
Paul Gilbert <pgilbert at bank-banque-canada.ca> writes:
I've been following this thread hoping for the definitive answer... Peter Dalgaard wrote: ....
Well, I do tend to think that we should just use utf, assuming that people have the relevant glyphs. If they don't, then they might get little hollow rectangles but so what?
Unfortunately, they might get nothing visible at all, and they might also get something completely wrong (happens on my Windows' X11 server on my laptop). This is not an R problem but a question of the quality of implementation of UTF-8. (Given the lack of UTF-8 fonts, I don't see the latter changing any time soon.) My comments (at UseR and to G?ran) are intended to make people aware just how badly things can go wrong: it is up to the users to decide if transliteration is worse than the chance of mangling.
My problem is that I put an ? in a reference in an Rd file, and now my builds fail on some of my systems. I can switch which systems work and which are broken, but I can not get it to work on all systems. I have spent way too much time trying to figure out what is wrong. So, wrt "so what", I need to choose between checking my packages on all the different systems I use, or having an ? in the Rd file. I think my problem is more complicated than having the relevant glyphs. I suspect it has to do with having the same locale on all systems doing NFS mounts, or on my cvs server, or something strange like that.
Just to clarify, one thing is what I feel should be the longer term strategy, another is what the R build tools can currently do... Did you follow the advice to declare your input encoding with \encoding and use \enc to provide a transliteration?
It is necessary to do so. I use a mixture of UTF-8 and latin1 locales on systems sharing a file system, and it all works for me: iconv does the charset translations transparently provided it knows what to do.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote:
On Wed, 28 Jun 2006, Peter Dalgaard wrote:
Paul Gilbert <pgilbert at bank-banque-canada.ca> writes:
I've been following this thread hoping for the definitive answer... Peter Dalgaard wrote: ....
Well, I do tend to think that we should just use utf, assuming that people have the relevant glyphs. If they don't, then they might get little hollow rectangles but so what?
Unfortunately, they might get nothing visible at all, and they might also get something completely wrong (happens on my Windows' X11 server on my laptop). This is not an R problem but a question of the quality of implementation of UTF-8. (Given the lack of UTF-8 fonts, I don't see the latter changing any time soon.) My comments (at UseR and to G?ran) are intended to make people aware just how badly things can go wrong: it is up to the users to decide if transliteration is worse than the chance of mangling.
My problem is that I put an ? in a reference in an Rd file, and now my builds fail on some of my systems. I can switch which systems work and which are broken, but I can not get it to work on all systems. I have spent way too much time trying to figure out what is wrong. So, wrt "so what", I need to choose between checking my packages on all the different systems I use, or having an ? in the Rd file. I think my problem is more complicated than having the relevant glyphs. I suspect it has to do with having the same locale on all systems doing NFS mounts, or on my cvs server, or something strange like that.
Just to clarify, one thing is what I feel should be the longer term strategy, another is what the R build tools can currently do... Did you follow the advice to declare your input encoding with \encoding and use \enc to provide a transliteration?
It has been several months since I did this, but I thought I had followed all the instructions.
It is necessary to do so. I use a mixture of UTF-8 and latin1 locales on systems sharing a file system, and it all works for me: iconv does the charset translations transparently provided it knows what to do.
Ok, I will try again sometime when I have a bit more time.
Thanks,
Paul
====================================================================================
La version fran?aise suit le texte anglais.
------------------------------------------------------------------------------------
This email may contain privileged and/or confidential inform...{{dropped}}
[Spencer Graves]
[...] I, too, would like to see all the accents, Arabic script, Chinese characters, etc., that other people want to use. However, we must work with the world as it is, not as we would like it to be (while devoting some time where appropriate to making the world better, as everyone who contributes to the R Project does).
Granted and agreed. Yet, R already does already a little more than a few other programming languages in this area, and this is particularly sympathetic! :-) One could hope and wish that R developers, within reasonable efforts, continue making R better and even make it take some lead in this area. Not going fanatic about it of course, but at least, carefully avoiding any backward move in development, or changes that would be unfriendly to internationalisation of R.
Fran?ois Pinard http://pinard.progiciels-bpi.ca