Skip to content

Input encoding problem when using sweave with xetex

16 messages · Erich Studerus, Duncan Murdoch, Richard M. Heiberger

#
On 12/05/2010 8:37 AM, Erich Studerus wrote:
You need to use iconv() to change an encoding.  What you did just 
changes the declared encoding, but doesn't actually change any bits.  So 
you'd probably get what you want with

x <- iconv(x, "", "UTF-8")
x

(though you may need to declare the input encoding; it is likely CP1252 
on Windows).
#
Thanks. Since the encoding of x is unknown (Encoding(x) gives "unknown"), I
tried

iconv(x, "", "UTF-8") 

Unfortunately, accented letters are still not printed in the final PDF
output.

Regards,
Erich



-----Urspr?ngliche Nachricht-----
Von: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] 
Gesendet: Mittwoch, 12. Mai 2010 15:27
An: Erich Studerus
Cc: r-help at r-project.org
Betreff: Re: [R] Input encoding problem when using sweave with xetex
On 12/05/2010 8:37 AM, Erich Studerus wrote:
Sweave
with
within
R
encoding
You need to use iconv() to change an encoding.  What you did just 
changes the declared encoding, but doesn't actually change any bits.  So 
you'd probably get what you want with

x <- iconv(x, "", "UTF-8")
x

(though you may need to declare the input encoding; it is likely CP1252 
on Windows).
http://www.R-project.org/posting-guide.html
#
On 12/05/2010 9:48 AM, Erich Studerus wrote:
I think I gave you incomplete advice.

The line above will convert the native encoding to UTF-8.  That's 
probably fine, but it's not actually helpful.

The problem is that when R outputs a vector, it will convert it back to 
the native encoding, unless you take action to stop that.  If you don't 
mind changing your document for Windows, you can put

\usepackage[cp1252]{inputenc}

into the preamble, and use the Windows native CP1252 encoding 
throughout.  If you want something that will work in UTF-8 on Windows, 
you need to say

options(encoding="UTF-8")

*before* running Sweave.  (If you're running Sweave from the command 
line using "R CMD Sweave" then I don't know if you can specify the 
output encoding; it won't help to do it in the document code chunks).  
You also need to put the line

\usepackage[utf8]{inputenc}

into the document preamble, but it sounds as though Lyx has already done 
that for you.

Duncan Murdoch
#
Putting \usepackage[cp1252]{inputenc} into my preamble is not an option,
because XeTeX unlike LaTeX needs UTF-8 has input encoding. My goal is also
to have a LyX document that can be compiled both on Mac and Windows.

I usually compile my Lyx-Sweave documents by one click of a button from
within Lyx. R code chunks are therefore executed by calling R from the
command line. If anybody knows how to run R with options(encoding="UTF-8")
from the command line under windows, that would be helpful.

The command that calls R during compilation is contained in this file:
http://cran.r-project.org/contrib/extra/lyx/preferences

Regards,
Erich


-----Urspr?ngliche Nachricht-----
Von: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] 
Gesendet: Mittwoch, 12. Mai 2010 16:56
An: Erich Studerus
Cc: r-help at r-project.org
Betreff: Re: [R] Input encoding problem when using sweave with xetex
On 12/05/2010 9:48 AM, Erich Studerus wrote:
I
I think I gave you incomplete advice.

The line above will convert the native encoding to UTF-8.  That's 
probably fine, but it's not actually helpful.

The problem is that when R outputs a vector, it will convert it back to 
the native encoding, unless you take action to stop that.  If you don't 
mind changing your document for Windows, you can put

\usepackage[cp1252]{inputenc}

into the preamble, and use the Windows native CP1252 encoding 
throughout.  If you want something that will work in UTF-8 on Windows, 
you need to say

options(encoding="UTF-8")

*before* running Sweave.  (If you're running Sweave from the command 
line using "R CMD Sweave" then I don't know if you can specify the 
output encoding; it won't help to do it in the document code chunks).  
You also need to put the line

\usepackage[utf8]{inputenc}

into the document preamble, but it sounds as though Lyx has already done 
that for you.

Duncan Murdoch
the
set
http://www.R-project.org/posting-guide.html
#
On 12/05/2010 11:36 AM, Erich Studerus wrote:
You can do it with a little work.  If you look at the 
rhome/bin/Sweave.sh file, you'll see that

 R CMD Sweave file.Rnw

just executes something like

echo "library(\"utils\"); Sweave(\"file.Rnw\")" | R --no-restore --slave

What you want is to execute

echo "library(\"utils\"); options(encoding="UTF-8"); 
Sweave(\"file.Rnw\")" | R --no-restore --slave

You could edit the rhome/bin/Sweave.sh file appropriately if you always 
want Sweave to use UTF-8, or you could edit your Lyx preference file to 
put in a line like this instead of what it had.

Duncan Murdoch
#
Thank you! I edited the Sweave.sh file and it works now for reading data 
stored as R data files, but the read.xls function from the gdata-package 
does no longer work.

options('encoding'='UTF-8')
require(gdata)
read.xls("http://www.schwerhoerigkeit.pop.ch/hoergeraete_test.xls", 
stringsAsFactors = F)[2,2]

This gives NA as output and a warning that there was an invalid entry for 
the connection. Is there a way to use read.xls with UTF-8 encoding?





On Wed, 12 May 2010 12:45:51 -0400
Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
#
On 12/05/2010 2:29 PM, Erich Studerus wrote:
I don't know, you'd have to ask its authors.  But you can probably set 
options(encoding="") before calling it.  (I'm assuming the read.xls() is 
in a code chunk in your Sweave file.  I would guess that you don't need 
to set the encoding back to UTF-8 afterwards, but you might:  it all 
depends on when Sweave() opens its
output file.)

Duncan Murdoch
#
Thanks again. Putting options(encoding="") into the R code chunk before 
calling the read.xls function indeed did the trick. Now, I have one last 
question: How can I edit the LyX preference file to call R with 
options(encoding='UTF-8'). I prefer to edit the LyX preference file rather 
than the Sweave.sh file, because I want to run R with UTF-8 only when I use 
 XeTeX. By editing the LyX preference file, I assume it would be possible to 
define different R encoding setups for different converters.

Erich


On Wed, 12 May 2010 14:43:52 -0400
Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
#
On 12/05/2010 3:55 PM, Erich Studerus wrote:
One way would be to change

R CMD Sweave $$i

in

\converter "literate" "latex"    "R CMD Sweave $$i"  ""

to

echo "library(\"utils\"); options(encoding=\"UTF-8\"); Sweave(\"$$i\")" | R --no-restore --slave

but you may run into trouble because there's an extra set of "" to go around there, and I don't know how 
Lyx handles escaping all those quotes.  So probably an easier way to go would be to take the Sweave.sh file
that you've already edited, and copy it into UTF8Sweave.sh in the same directory.  Then you can change the Lyx 
preference line to

\converter "literate" "latex"    "R CMD UTF8Sweave $$i"  ""

and things should work.  (Then you can restore the original Sweave.sh file for cases where you don't
want UTF-8.)

Duncan Murdoch
#
You were right, that I would run into trouble with all these "". However, 
the second method doesn't work either. Somehow, the UTF8Sweave.sh file is 
not found during the compilation of my document. It says "can't open perl 
script .../bin/UTF8Sweave.sh: No such file or directory". I double checked 
that a file with this name is present in the same folder as the original 
Sweave.sh file and that the "Sweave" is changed to "UTF8Sweave" in the LyX 
preference file. Any idea, what could be wrong?

Erich


On Wed, 12 May 2010 16:20:03 -0400
Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
#
On 12/05/2010 5:12 PM, Erich Studerus wrote:
Sorry, I forgot that if you make up your own command to be run by R CMD 
then you need to give the full path to it, and you can't suppress the 
.sh suffix.  So your command should be

R CMD /path/to/UTF8Sweave.sh $$i


(where /path/to/ is the actual path to where you decide to keep this 
file, which needn't be in rhome/bin).

Duncan Murdoch
#
Thanks again. It finally worked after copying the Sweave.sh file to a path 
that contains no spaces. Is there a special command for windows specific 
path names with spaces?

Erich


On Wed, 12 May 2010 18:04:44 -0400
Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
#
On 12/05/2010 7:07 PM, Erich Studerus wrote:
Putting the path in double quotes might work, but I've no idea how to 
tell lyx to do that.

Duncan Murdoch
#
With the kind help of Duncan Murdoch, I finally managed to correctly compile 
LyX-Sweave documents with XeTeX under Windows. In case someone else is 
struggling with a similar problem in the future, here is a small summary of 
what worked for me. I assume that you have already set up Lyx to work with 
Sweave and XeTeX according to the following instructions:
http://cran.r-project.org/contrib/extra/lyx/INSTALL
http://wiki.lyx.org/LyX/XeTeX

My original problem was that accented letters that were read from 
Excel-files or other external sources from within R during the LyX-Sweave 
document compilation were not correctly encoded in UTF-8. However, UTF-8 is 
needed for typsetting the final document with XeTeX.

Because R code chunks contained in the Lyx-Sweave document were executed by 
running R from the command line, it was necessary to start R with UTF-8 
encoding from the command line. I managed to do this by editing the 
Sweave.sh file which can be found in rhome/bin/ directory. Within sweave.sh, 
I changed the command

echo "library(\"utils\"); Sweave(\"file.Rnw\")" | R --no-restore --slave

into

echo "library(\"utils\"); options(encoding=\"UTF-8\"); Sweave(\"file.Rnw\")" 
| R --no-restore --slave

Because I wanted to run R with UTF-8 encoding only when I compile my 
documents with XeTeX and not with LaTeX, I saved the Sweave.sh file with the 
new name UTF8Sweave.sh in the same directory and changed the definition for 
the conversion of Lyx-Sweave documents to PDF (xelatex). I did this by 
defining a new file format in LyX under
Tools->Preferences->File handling->File formats. The file format that I 
defined had the same settings as LaTeX (pdflatex), but I saved it with the 
new name LaTeX (pdflatexUTF8). I then defined a new converter under 
Tools->Preferences->File handling->File formats with the following settings:
 From format: Sweave
To format: LaTex (pdflatexUTF8)
Converter: R CMD /path/to/UTF8Sweave.sh $$i

Because the full path to UTF8Sweave.sh file contained blanks, I had to use 
the 8.3 filename which I determined by using the MSDOS cmd window. See this 
post for more information:
http://article.gmane.org/gmane.comp.lang.r.general/190040

Finally, I changed the converter LaTeX (pdflatex) -> PDF(xelatex) which I 
had already defined for the original LyX-XeTeX installation(see 
http://wiki.lyx.org/LyX/XeTeX) to LaTeX (pdflatexUTF8) -> PDF(xelatex).

I hope this helps.

Regards,
Erich