Best and thank you for your help,
Richard
On 11 Dec 2012, at 12:11, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
Le mardi 11 d?cembre 2012 ? 01:10 +0100, Richard Zijdeman a ?crit :
Dear all,
I have imported a dataset from Stata using the foreign package. The
original data contain French characters such as and .
After importing, string variables containing names of French
departments have changed. E.g. Ardche became Ard\x8fche. I would like
to ask how I could plot these changed strings, since now the strings
with special characters fail to be printed in the plot (either using
plot() or ggplot2()).
I have googled for solutions, but actually find it hard to determine
whether I should change my R setup or should read in the data in a
different way. Since I work on a mac I changed my local according to
the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and
code and output on what works for me and what does not. Thank you in
advance for you comments.
Accentuated characters should work fine on a machine using a UTF-8
locale as yours. I think the problem is that the imported data uses
ISO8859-15 or UTF-16, not UTF-8.
I have no idea whether .dta files specify an encoding or not, but I
think you can convert them in R by calling
iconv(department, "ISO-8859-15", "UTF-8")
or
iconv(department, "UTF-16", "UTF-8")
Best,
Richard
#--------------
rm(list=ls())
sessionInfo()
# R version 2.15.2 (2012-10-26)
# Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
#
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# creating variables
department <- c("Nord","Paris","Ard\x8fche")
\x8 does not correspond to "?" AFAIK. In ISO8859-1 and -15 and UTF-16,
it's \xE8 ("\uE8" in R).
In UTF-8, it's C3 A8, "\303\250" in R.
department2 <- c("Nord", "Paris", "Ardche")
n <- c(2,4,1)
# creating dataframes
df <- data.frame(department,n)
df2 <- data.frame(department2,n)
department
# [1] "Nord" "Paris" "Ard\x8fche"
department2
# [1] "Nord" "Paris" "Ardche"
plot(df) # fails to show the text "Ardche"
plot(df2) # shows text "Ardche"
# EOF
[[alternative HTML version deleted]]