Skip to content

issue with encoding in R-2.8.1 invalid multibyte character

8 messages · Wijffels, Jan, Peter Dalgaard, Brian Ripley

#
Well, we don't see what you see. but if ? was hex a7, the message is 
entirely correct.  If you want to enter that, use "\xa7".
On Tue, 30 Dec 2008, Wijffels, Jan wrote:

            

  
    
#
Prof Brian Ripley wrote:
We see different things. I see  a section sign (double s) symbol. From 
the symptoms, I would suspect that the terminal is set to latin-1 or -15 
(both have the section sign at 0xa7) even though the system (and thus R) 
is utf-8.

(Incidentally said 0xa7 is know as the "paragraph" symbol in Danish 
legal texts, whereas the paragraph symbol at 0xb6 is largely unknown.)
#
On Wed, 31 Dec 2008, Peter Dalgaard wrote:

            
Right, and my point is that we do not know what he actually sees.
I thought of that, but if the system is in UTF-8, so would its keyboard 
be.  Perhaps this is a remote session from a Windows system to a UTF-8 
one? (In which case set the remote locale appropriately.)

The issue seemed to be about entering Latin characters (-1 or -9, I think: 
latin-9 is ISO 8859-15), and that is what I tried to answer.
#
Yes, it was the section sign (double s) symbol that I was trying to
print connecting from a Windows machine with Latin1 encoding to a UTF-8
Linux machine.
I changed the translation behaviour in my Putty SSH from Latin1 to UTF-8
and now the interactive R programming works.
My scripts which I run with Rscript my_script.r contain quite some
Latin-1 characters. These ran ok in R2.7.0 but not any more in R2.8.1
but I presume this is because in 2.7.1 the changes made to the system
indicated 'The parser sometimes accepted invalid quoted strings in a
UTF-8 locale'. 
So this means for me I need to change the scripts I develop in Latin1 on
Windows to UTF-8 before I upload them to our server.
Thanks for the help.

-----Oorspronkelijk bericht-----
Van: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Verzonden: woensdag 31 december 2008 9:22
Aan: Peter Dalgaard
CC: Wijffels, Jan; r-help at r-project.org
Onderwerp: Re: [R] issue with encoding in R-2.8.1 invalid multibyte
character
On Wed, 31 Dec 2008, Peter Dalgaard wrote:

            
Right, and my point is that we do not know what he actually sees.
(both
utf-8.

I thought of that, but if the system is in UTF-8, so would its keyboard 
be.  Perhaps this is a remote session from a Windows system to a UTF-8 
one? (In which case set the remote locale appropriately.)

The issue seemed to be about entering Latin characters (-1 or -9, I
think: 
latin-9 is ISO 8859-15), and that is what I tried to answer.
#
On Wed, 31 Dec 2008, Wijffels, Jan wrote:

            
Or, as I suggested below, run the R session on the server in Latin1.

% LC_ALL=nl_BE  R

(guessing, or use en_US) should do it.

  
    
#
On Wed, 31 Dec 2008, Wijffels, Jan wrote:

            
UTF-8
UTF-8
on
Or, as I suggested below, run the R session on the server in Latin1.

% LC_ALL=nl_BE  R

(guessing, or use en_US) should do it.

Even better :), thanks
keyboard

  
    
#
Wijffels, Jan wrote:
...
Otherwise, depending on your workflow, you might find that "iconv" is 
your friend. (Notice that the above will give you output in Latin1 too, 
which may be exactly what you need, but then again maybe not.)