Skip to content

substitute accents

3 messages · Manuel Gutierrez, Brian Ripley

#
I have an openoffice spreadsheet with a column of
character strings.
Some of them contain accents.
I want to read it in R so I have saved it as a csv
file using Western Europe (ISO-8859-1) character set
(the default, I've tried other sets but it doesn't
help).
R reads it fine with 

CharMatrix<-read.csv("test.csv",header=F,sep=",",as.is=TRUE);
Say I wan't to replace the 'o' with accent in the
first cell.
I've tried:
gsub('??','o', CharMatrix[1,1])
But, It doesn't make any substitution

Trying to find a solution I input the character string
in R and do the substitution:
CharMatrix[1,1]<-"h??la"
gsub('??','o', CharMatrix[1,1])
And it works. I think the difference is that when I
now print the content of CharMatrix I get a \201
before the ?? while I didn't get it with the openoffice
imported csv file.
I'm sure it is a problem with my understanding of how
accents can be specified. Can someone give me any
solutions / references?
Thanks,
M

         _                
platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    2                
minor    0.0              
year     2004             
month    10               
day      04               
language R   
		
______________________________________________
#
Can you please tell us what locale you are working in?

This looks as if the problem might be the use of a UTF-8 locale, which R 
does not currently support and which some Linux distros have made their 
default.  However, R does issue a warning -- so did you get one?
On Thu, 25 Nov 2004, Manuel Gutierrez wrote:

            

  
    
#
$ locale
LANG=en_GB
LC_CTYPE="en_GB"
LC_NUMERIC="en_GB"
LC_TIME="en_GB"
LC_COLLATE="en_GB"
LC_MONETARY="en_GB"
LC_MESSAGES="en_GB"
LC_PAPER="en_GB"
LC_NAME="en_GB"
LC_ADDRESS="en_GB"
LC_TELEPHONE="en_GB"
LC_MEASUREMENT="en_GB"
LC_IDENTIFICATION="en_GB"
LC_ALL=

  
$ locale charmap
ISO-8859-1

I have tried changing the locales with no difference.
Is this fine?
And, no, I didn't get any warning message.
My sistem is a debian sid under kde 3.3.
Thanks,
M

 --- Prof Brian Ripley <ripley at stats.ox.ac.uk>
escribi??:
CharMatrix<-read.csv("test.csv",header=F,sep=",",as.is=TRUE);
272595 


		
______________________________________________