Skip to content

Sweave encoding problem

7 messages · Gerrit Voigt, Rau, Roland, Wacek Kusnierczyk +1 more

#
Hello,
Sweave seems to have trouble processing german letters in R.
For example, my noweb R-input looks like this.
<<>>=
Oberfl?chenfehler = c(4, 11, 6, 2, 7, 9)
@
If I send it through Sweave, I get the following error message.

error:  chunk 1
Error in parse(text = chunk) : unexpected input in "Oberfl??"
extra: Warning message:
In readLines(f[1]) :
   underfull last line in "C:\...."

(my R is in german, so I needed to translate the error message myself.)

I got the impression, that this is an encoding issue of Sweave, since  
the input typed into R directly works just fine. The encoding I use in  
my noweb document is utf8.

Thanks in advance
Gerrit
#
Hi Gerrit,
I don't think it has something to do with German letters.
I saved the following text in a file 'sweavy.Snw':
\documentclass{article}

\begin{document}
Hello World!

<<>>=
1+1
@ 

<<>>=
Oberfl?chenfehler = c(4, 11, 6, 2, 7, 9)
@
\end{document}

This is what happened in R:
Writing to file sweavy.tex
Processing code chunks ...
 1 : echo term verbatim
 2 : echo term verbatim

You can now run LaTeX on 'sweavy.tex'
R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

And also the dvi looked fine after processing "latex sweavy.tex"
To make things sure, I did in my editor (GNU Emacs 22.1.50.1)
C-x RET f utf-8
to change 
set-buffer-file-coding-system to utf-8.
Still works fine.

Maybe this helps you further to track down the reason for the problem?!?

Best,
Roland

----------
This mail has been sent through the MPI for Demographic Research.  Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
2 days later
#
Gerrit Voigt wrote:
This sounds like you have discovered homeopathic properties in Sweave!  
It will be serious if input files remember errors even after they have 
been removed.

But I think it's more likely that the files just look the same in your 
editor, but are actually different in some way you don't see.  Candidates:
 - the encoding:  maybe your editor is recognizing the encoding, and 
automatically displaying similar content from different input.
 - non-printing characters:  maybe your editor is skipping some.

I'd suggest doing a binary compare on the two files to see what the 
differences are.  I think you are on Windows (but I may be misreading 
the quotes below); I recommend Beyond Compare (a shareware compare 
utility).  It has a hex viewer plug-in that could show you a detailed 
comparison.  I imagine diff on Unix has something similar.

Duncan Murdoch
#
Duncan Murdoch wrote:
diff -s

vQ
#
The two documents were  actually  different, which I didn't notice 
yesterday. One had different encoding. Thanks for your help Duncan.
Unfortunetly the other problem still exists. My R or Sweave seems not to 
be able to work with utf-8 encoding.  Everything works fine with 
latin-1, though. I could check my assumption if there was a possibility 
to switch R from latin-1 to utf-8. Does anybody have an idea how that 
might work?

Gerrit Voigt

Duncan Murdoch schrieb:
#
Gerrit Voigt wrote:
Connections and functions that read from them generally have an 
"encoding" argument; I think you need to have that set to "UTF-8" or 
"latin1" as appropriate.  However, Sweave() doesn't offer an option to 
pass that arg down to the readLines() call that actually reads the 
file.  I believe options(encoding="UTF-8") or options(encoding="latin1") 
will set the default if you run it before calling Sweave. 

You will probably find it frustrating to keep switching that option; I'd 
recommend storing files in the native encoding for your system, which R 
will default to using.  (This doesn't work if you share the same file on 
multiple systems, of course.)

Duncan Murdoch