I have an openoffice spreadsheet with a column of
character strings.
Some of them contain accents.
I want to read it in R so I have saved it as a csv
file using Western Europe (ISO-8859-1) character set
(the default, I've tried other sets but it doesn't
help).
R reads it fine with
CharMatrix<-read.csv("test.csv",header=F,sep=",",as.is=TRUE);
Say I wan't to replace the 'o' with accent in the
first cell.
I've tried:
gsub('??','o', CharMatrix[1,1])
But, It doesn't make any substitution
Trying to find a solution I input the character string
in R and do the substitution:
CharMatrix[1,1]<-"h??la"
gsub('??','o', CharMatrix[1,1])
And it works. I think the difference is that when I
now print the content of CharMatrix I get a \201
before the ?? while I didn't get it with the openoffice
imported csv file.
I'm sure it is a problem with my understanding of how
accents can be specified. Can someone give me any
solutions / references?
Thanks,
M
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 0.0
year 2004
month 10
day 04
language R
______________________________________________
substitute accents
3 messages · Manuel Gutierrez, Brian Ripley
Can you please tell us what locale you are working in? This looks as if the problem might be the use of a UTF-8 locale, which R does not currently support and which some Linux distros have made their default. However, R does issue a warning -- so did you get one?
On Thu, 25 Nov 2004, Manuel Gutierrez wrote:
I have an openoffice spreadsheet with a column of
character strings.
Some of them contain accents.
I want to read it in R so I have saved it as a csv
file using Western Europe (ISO-8859-1) character set
(the default, I've tried other sets but it doesn't
help).
R reads it fine with
CharMatrix<-read.csv("test.csv",header=F,sep=",",as.is=TRUE);
Say I wan't to replace the 'o' with accent in the
first cell.
I've tried:
gsub('?','o', CharMatrix[1,1])
But, It doesn't make any substitution
Trying to find a solution I input the character string
in R and do the substitution:
CharMatrix[1,1]<-"h?la"
gsub('?','o', CharMatrix[1,1])
And it works. I think the difference is that when I
now print the content of CharMatrix I get a \201
before the ? while I didn't get it with the openoffice
imported csv file.
I'm sure it is a problem with my understanding of how
accents can be specified. Can someone give me any
solutions / references?
Thanks,
M
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 0.0
year 2004
month 10
day 04
language R
______________________________________________ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
$ locale LANG=en_GB LC_CTYPE="en_GB" LC_NUMERIC="en_GB" LC_TIME="en_GB" LC_COLLATE="en_GB" LC_MONETARY="en_GB" LC_MESSAGES="en_GB" LC_PAPER="en_GB" LC_NAME="en_GB" LC_ADDRESS="en_GB" LC_TELEPHONE="en_GB" LC_MEASUREMENT="en_GB" LC_IDENTIFICATION="en_GB" LC_ALL= $ locale charmap ISO-8859-1 I have tried changing the locales with no difference. Is this fine? And, no, I didn't get any warning message. My sistem is a debian sid under kde 3.3. Thanks, M --- Prof Brian Ripley <ripley at stats.ox.ac.uk> escribi??:
Can you please tell us what locale you are working in? This looks as if the problem might be the use of a UTF-8 locale, which R does not currently support and which some Linux distros have made their default. However, R does issue a warning -- so did you get one? On Thu, 25 Nov 2004, Manuel Gutierrez wrote:
I have an openoffice spreadsheet with a column of character strings. Some of them contain accents. I want to read it in R so I have saved it as a csv file using Western Europe (ISO-8859-1) character
set
(the default, I've tried other sets but it doesn't help). R reads it fine with
CharMatrix<-read.csv("test.csv",header=F,sep=",",as.is=TRUE);
Say I wan't to replace the 'o' with accent in the
first cell.
I've tried:
gsub('??','o', CharMatrix[1,1])
But, It doesn't make any substitution
Trying to find a solution I input the character
string
in R and do the substitution:
CharMatrix[1,1]<-"h??la"
gsub('??','o', CharMatrix[1,1])
And it works. I think the difference is that when
I
now print the content of CharMatrix I get a \201 before the ?? while I didn't get it with the
openoffice
imported csv file. I'm sure it is a problem with my understanding of
how
accents can be specified. Can someone give me any
solutions / references?
Thanks,
M
_
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 0.0
year 2004
month 10
day 04
language R
______________________________________________ ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865
272595 ______________________________________________