Skip to content

[R-pkg-devel] .Rd, LaTeX and Unicode

7 messages · Serguei Sokol, Hugh Parsonage, Martin Maechler +2 more

#
Hi,

I am preparing a package where I would like to use UTF characters in .Rd 
files. When the LaTeX comes to play, I got well known errors e.g.:
! Package inputenc Error: Unicode character ? (U+2202)
(inputenc)??????????????? not set up for use with LaTeX.

It is coherent with what is said on this page 
https://developer.r-project.org/Encodings_and_R.html :
"Since LaTeX cannot handle Unicode we would have to convert the encoding 
of latex help files or use Lambda (and tell it they were in UTF-8)."

But LaTeX can support UTF8 as shown with this small example:

\documentclass{article}
\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

\begin{document}
 ??? The vorticity ? is defined as $? = ? ? u$.
\end{document}

I can compile it with my LaTeX without problem. May be you too?
So my suggestion would be to place these two lines somewhere in LaTeX 
header generated by R doc system:
\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

Note "utf8x" and not just "utf8" which is crucial for this example.
With a hope that it would fix unicode errors from LaTeX.

Best,
Serguei.
#
utf8x is deprecated

https://tex.stackexchange.com/questions/13067/utf8x-vs-utf8-inputenc#13070



On Tue, 18 Jun 2019 at 7:52 pm, Serguei Sokol <serguei.sokol at gmail.com>
wrote:

  
  
#
> utf8x is deprecated
    > https://tex.stackexchange.com/questions/13067/utf8x-vs-utf8-inputenc#13070

Hmm... interestingly, I've tried quite a few versions of the
above which started in 2011, but had been updated in April 2016 :
   https://tex.stackexchange.com/a/203804/7228
from where it seems that

     	\usepackage[T1]{fontenc}
	\usepackage[utf8]{inputenc}

should be sufficient.  Further, note that from
  https://tex.stackexchange.com/a/238135/7228 
the {ucs} package should no longer be needed since ca. 2013,
hence your \usepackage[mathletters]{ucs}  would not be needed either.

HOWEVER:  After losing at least half an hour now, trying many
variants I found that the only version that works correctly for
me (with a teTeX / TeXlive version of 2018) is the version
Serguei Sokol proposes (below), including the use of the 'utf8x'
option *and* the 'ucs' package ...

which is pretty surprising after having read the
tex.statexchange threads ...

    > On Tue, 18 Jun 2019 at 7:52 pm, Serguei Sokol <serguei.sokol at gmail.com>
> wrote:
>> Hi,
    >> 
    >> I am preparing a package where I would like to use UTF characters in .Rd
    >> files. When the LaTeX comes to play, I got well known errors e.g.:
    >> ! Package inputenc Error: Unicode character ? (U+2202)
    >> (inputenc)                not set up for use with LaTeX.
    >> 
    >> It is coherent with what is said on this page
    >> https://developer.r-project.org/Encodings_and_R.html :
    >> "Since LaTeX cannot handle Unicode we would have to convert the encoding
    >> of latex help files or use Lambda (and tell it they were in UTF-8)."

That whole document has been very important and crucial, written
by Prof Brian Ripley  who had worked a *LOT* to bring unicode to R,
-- but it has been written 2004-2005  and indeed, I think it is
probably fair to say that the above sentence no longer applies
to current LaTeX engines (including "simple" pdflatex)... though really,
I'm not the expert here, but I think it's a good point in time
to reconsider how much UTF8 should be allowed/supported in *.Rd files.

One problem: This is (slightly) the wrong mailing list; this would have
been a perfect topic for 'R-devel' (discussing about new
features etc for R) instead....
( but we'd rather keep it here for now.)

Martin Maechler
ETH Zurich and R Core Team



    >> But LaTeX can support UTF8 as shown with this small example:

 \documentclass{article}
 \usepackage[mathletters]{ucs}
 \usepackage[utf8x]{inputenc}
 
 \begin{document}
 The vorticity ? is defined as $? = ? ? u$.
 \end{document}

    >> I can compile it with my LaTeX without problem. May be you too?
    >> So my suggestion would be to place these two lines somewhere in LaTeX
    >> header generated by R doc system:
    >> \usepackage[mathletters]{ucs}
    >> \usepackage[utf8x]{inputenc}
    >> 
    >> Note "utf8x" and not just "utf8" which is crucial for this example.
    >> With a hope that it would fix unicode errors from LaTeX.
    >> 
    >> Best,
    >> Serguei.
#
Since April 2018 'utf8' is the default input encoding in LaTeX, see
http://anorien.csc.warwick.ac.uk/mirrors/CTAN/macros/latex/doc/ltnews.pdf and they added some symbols in December.


Georgi Boshnakov






-----Original Message-----
From: R-package-devel [mailto:r-package-devel-bounces at r-project.org] On Behalf Of Martin Maechler
Sent: 18 June 2019 15:01
To: serguei.sokol at gmail.com; Hugh Parsonage
Cc: r-package-devel at r-project.org
Subject: Re: [R-pkg-devel] .Rd, LaTeX and Unicode
> utf8x is deprecated
    > https://tex.stackexchange.com/questions/13067/utf8x-vs-utf8-inputenc#13070

Hmm... interestingly, I've tried quite a few versions of the
above which started in 2011, but had been updated in April 2016 :
   https://tex.stackexchange.com/a/203804/7228
from where it seems that

     	\usepackage[T1]{fontenc}
	\usepackage[utf8]{inputenc}

should be sufficient.  Further, note that from
  https://tex.stackexchange.com/a/238135/7228 
the {ucs} package should no longer be needed since ca. 2013,
hence your \usepackage[mathletters]{ucs}  would not be needed either.

HOWEVER:  After losing at least half an hour now, trying many
variants I found that the only version that works correctly for
me (with a teTeX / TeXlive version of 2018) is the version
Serguei Sokol proposes (below), including the use of the 'utf8x'
option *and* the 'ucs' package ...

which is pretty surprising after having read the
tex.statexchange threads ...

    > On Tue, 18 Jun 2019 at 7:52 pm, Serguei Sokol <serguei.sokol at gmail.com>
> wrote:
>> Hi,
    >> 
    >> I am preparing a package where I would like to use UTF characters in .Rd
    >> files. When the LaTeX comes to play, I got well known errors e.g.:
    >> ! Package inputenc Error: Unicode character ? (U+2202)
    >> (inputenc)                not set up for use with LaTeX.
    >> 
    >> It is coherent with what is said on this page
    >> https://developer.r-project.org/Encodings_and_R.html :
    >> "Since LaTeX cannot handle Unicode we would have to convert the encoding
    >> of latex help files or use Lambda (and tell it they were in UTF-8)."

That whole document has been very important and crucial, written
by Prof Brian Ripley  who had worked a *LOT* to bring unicode to R,
-- but it has been written 2004-2005  and indeed, I think it is
probably fair to say that the above sentence no longer applies
to current LaTeX engines (including "simple" pdflatex)... though really,
I'm not the expert here, but I think it's a good point in time
to reconsider how much UTF8 should be allowed/supported in *.Rd files.

One problem: This is (slightly) the wrong mailing list; this would have
been a perfect topic for 'R-devel' (discussing about new
features etc for R) instead....
( but we'd rather keep it here for now.)

Martin Maechler
ETH Zurich and R Core Team



    >> But LaTeX can support UTF8 as shown with this small example:

 \documentclass{article}
 \usepackage[mathletters]{ucs}
 \usepackage[utf8x]{inputenc}
 
 \begin{document}
 The vorticity ? is defined as $? = ? ? u$.
 \end{document}

    >> I can compile it with my LaTeX without problem. May be you too?
    >> So my suggestion would be to place these two lines somewhere in LaTeX
    >> header generated by R doc system:
    >> \usepackage[mathletters]{ucs}
    >> \usepackage[utf8x]{inputenc}
    >> 
    >> Note "utf8x" and not just "utf8" which is crucial for this example.
    >> With a hope that it would fix unicode errors from LaTeX.
    >> 
    >> Best,
    >> Serguei.

______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
#
On 18/06/2019 17:10, Georgi Boshnakov wrote:
Interesting ... but still not sufficient. I have a fairly recent latex 
system:
$ latex --version
pdfTeX 3.14159265-2.6-1.40.19 (TeX Live 2018/Mageia)

but unfortunately utf8 alone (and even including
\usepackage[mathletters]{ucs}) cannot compile utf8 math expressions.

I have also tried a full scale test on a tex file obtained with
$ R CMD Rd2pdf --no-clean pkgname
where I replaced

\usepackage[utf8]{inputenc}

by

\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

and it did not compile either. In addition, I had to replace every 
occurrence of

\inputencoding{utf8}

by

\inputencoding{utf8x}

after what pdflatex worked like a charm.

Serguei.
#
Sorry, I failed to clarify that the link to ltnews.pdf was the point of my message. In some ways it is definitive from the LaTeX team. 

My understanding is that option 'mathletters' is not the default in ucs, since it produces math Greek and Hebrew letters also in text mode. 

Georgi Boshnakov


-----Original Message-----
From: Serguei Sokol [mailto:serguei.sokol at gmail.com] 
Sent: 19 June 2019 09:51
To: Georgi Boshnakov; Martin Maechler; Hugh Parsonage
Cc: r-package-devel at r-project.org
Subject: Re: [R-pkg-devel] .Rd, LaTeX and Unicode
On 18/06/2019 17:10, Georgi Boshnakov wrote:
Interesting ... but still not sufficient. I have a fairly recent latex 
system:
$ latex --version
pdfTeX 3.14159265-2.6-1.40.19 (TeX Live 2018/Mageia)

but unfortunately utf8 alone (and even including
\usepackage[mathletters]{ucs}) cannot compile utf8 math expressions.

I have also tried a full scale test on a tex file obtained with
$ R CMD Rd2pdf --no-clean pkgname
where I replaced

\usepackage[utf8]{inputenc}

by

\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

and it did not compile either. In addition, I had to replace every 
occurrence of

\inputencoding{utf8}

by

\inputencoding{utf8x}

after what pdflatex worked like a charm.

Serguei.
#
I may well be toadally out to luntch here (if so please excuse the added 
noise) but perhaps the following may be of some help or relevance.

In a package that I wrote, I wanted to include the ? symbol in *.Rda 
file.  After a lot of thrashing about (and seeking advice from younger 
and wiser heads) I used this strategy:

* I put "\encoding{UTF-8}" at the very start of the *.Rda file
   (before the \name{ } command

* I made use of the ? symbol via the syntax:

     \enc{Wei?}{Weiss}

Again I apologise if this is completely off the point.

cheers,

Rolf
On 19/06/19 8:50 PM, Serguei Sokol wrote: