Hi,
I ran into this issue previously and managed to solve it, but I've
forgotten how and am getting frustrated...
I have a data frame (see below) with scandinavian characters in R
(2.7.1) running on a Win Xp-computer. I save the data frame in an
RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
10.5. Now the name of the data frame and all factor labels with
scandinavian characters are scrambled. How do I make R in OS X read my
data frame?
From what I've managed to find in the list archives and the FAQ I either
1) run
Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything
or
2) run
defaults write org.R-project.R force.LANG en_US.UTF-8
in the terminal, which doesn't help either.
I must admit that I couldn't quite follow what documentation i found
on locales, so I might have messed up somewhere along the line.
Many thanks in advance for your help!
Regards,
Gustaf
--------
L?nkarta <-
structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
"H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
"G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
"Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
"Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n",
"V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
"V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
"factor")), .Names = c("LANKOD",
"L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20"))
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
Hi,
On my system (see below), it works fine (inputing the code below at
the R prompt). Make sure that the encoding of the input file is
encoded UTF-8.
Rgds,
Ivan
> sessionInfo()
R version 2.8.1 Patched (2009-01-14 r47602)
i386-apple-darwin9.6.0
locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
> structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,7L, 9L,
18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L, 12L), .Label
= c("AB", "AC", "BD", "C", "D", "E", "F", "G","H", "I", "K", "M", "N",
"O", "S", "T", "U", "W", "X", "Y", "Z"), class = "factor"), L?n =
structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L, 20L, 21L, 13L,
14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label = c("Blekinge l?n",
"Dalarnas l?n", "Gotlands l?n","G?vleborgs l?n","Hallands l?n",
"J?mtlands l?n", "J?nk?pings l?n","Kalmar l?n", "Kronobergs l?n",
"Norrbottens l?n", "Sk?ne l?n","Stockholms l?n", "S?dermanlands l?n",
"Uppsala l?n", "V?rmlands l?n","V?sterbottens l?n", "V?sternorrlands
l?n", "V?stmanlands l?n","V?stra G?talands l?n", "?rebro l?n",
"?sterg?tlands l?n"), class ="factor")), .Names = c("LANKOD","L?n"),
class = "data.frame", row.names = c("0", "1", "2", "3","4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15","16", "17", "18",
"19", "20"))
LANKOD L?n
0 K Blekinge l?n
1 X G?vleborgs l?n
2 I Gotlands l?n
3 N Hallands l?n
4 Z J?mtlands l?n
5 F J?nk?pings l?n
6 H Kalmar l?n
7 W Dalarnas l?n
8 G Kronobergs l?n
9 BD Norrbottens l?n
10 T ?rebro l?n
11 E ?sterg?tlands l?n
12 D S?dermanlands l?n
13 C Uppsala l?n
14 S V?rmlands l?n
15 AC V?sterbottens l?n
16 Y V?sternorrlands l?n
17 U V?stmanlands l?n
18 AB Stockholms l?n
19 O V?stra G?talands l?n
20 M Sk?ne l?n
> L?nkarta <- structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L,
21L,7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L,
12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G","H", "I",
"K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"), class =
"factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L,
20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label =
c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n","G?vleborgs
l?n","Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n","Kalmar l?n",
"Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n","Stockholms l?n",
"S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n","V?sterbottens
l?n", "V?sternorrlands l?n", "V?stmanlands l?n","V?stra G?talands
l?n", "?rebro l?n", "?sterg?tlands l?n"), class ="factor")), .Names =
c("LANKOD","L?n"), class = "data.frame", row.names = c("0", "1", "2",
"3","4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15","16", "17", "18", "19", "20"))
> ls()
[1] "L?nkarta"
>
On 16 Jan 2009, at 14:13, Gustaf Rydevik wrote:
Hi,
I ran into this issue previously and managed to solve it, but I've
forgotten how and am getting frustrated...
I have a data frame (see below) with scandinavian characters in R
(2.7.1) running on a Win Xp-computer. I save the data frame in an
RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
10.5. Now the name of the data frame and all factor labels with
scandinavian characters are scrambled. How do I make R in OS X read my
data frame?
From what I've managed to find in the list archives and the FAQ I
either
1) run
Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything
or
2) run
defaults write org.R-project.R force.LANG en_US.UTF-8
in the terminal, which doesn't help either.
I must admit that I couldn't quite follow what documentation i found
on locales, so I might have messed up somewhere along the line.
Many thanks in advance for your help!
Regards,
Gustaf
--------
L?nkarta <-
structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
"H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
"G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
"Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
"Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n",
"V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
"V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
"factor")), .Names = c("LANKOD",
"L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20"))
--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
It displays sensibly (at least I think so, not being a reader of any
Scandinavian language) on my Mac (10.5.6).
> L?nkarta <-
+ structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
+ 7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
+ 14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
+ "H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
+ ), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
+ 8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
+ 19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
+ "G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
+ "Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
+ "Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n",
+ "V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
+ "V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
+ "factor")), .Names = c("LANKOD",
+ "L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
+ "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
+ "16", "17", "18", "19", "20"))
> L?nkarta
LANKOD L?n
0 K Blekinge l?n
1 X G?vleborgs l?n
2 I Gotlands l?n
3 N Hallands l?n
4 Z J?mtlands l?n
5 F J?nk?pings l?n
6 H Kalmar l?n
7 W Dalarnas l?n
8 G Kronobergs l?n
9 BD Norrbottens l?n
10 T ?rebro l?n
11 E ?sterg?tlands l?n
12 D S?dermanlands l?n
13 C Uppsala l?n
14 S V?rmlands l?n
15 AC V?sterbottens l?n
16 Y V?sternorrlands l?n
17 U V?stmanlands l?n
18 AB Stockholms l?n
19 O V?stra G?talands l?n
20 M Sk?ne l?n
>
> sessionInfo()
R version 2.8.1 Patched (2009-01-07 r47515)
i386-apple-darwin9.6.0
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
(An etiquette note: It is considered impolite to cross post to both
the r-help and r-sig-mac lists.)
David Winsemius
On Jan 16, 2009, at 8:13 AM, Gustaf Rydevik wrote:
> Hi,
> I ran into this issue previously and managed to solve it, but I've
> forgotten how and am getting frustrated...
>
> I have a data frame (see below) with scandinavian characters in R
> (2.7.1) running on a Win Xp-computer. I save the data frame in an
> RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
> 10.5. Now the name of the data frame and all factor labels with
> scandinavian characters are scrambled. How do I make R in OS X read my
> data frame?
>> From what I've managed to find in the list archives and the FAQ I
>> either
> 1) run
> Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything
> or
> 2) run
> defaults write org.R-project.R force.LANG en_US.UTF-8
> in the terminal, which doesn't help either.
> I must admit that I couldn't quite follow what documentation i found
> on locales, so I might have messed up somewhere along the line.
>
> Many thanks in advance for your help!
>
> Regards,
>
> Gustaf
>
>
> --------
>
> L?nkarta <-
> structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
> 7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
> 14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
> "H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
> ), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
> 8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
> 19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
> "G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
> "Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
> "Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n",
> "V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
> "V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
> "factor")), .Names = c("LANKOD",
> "L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
> "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
> "16", "17", "18", "19", "20"))
>
>
>
> --
> Gustaf Rydevik, M.Sci.
> tel: +46(0)703 051 451
> address:Essingetorget 40,112 66 Stockholm, SE
> skype:gustaf_rydevik
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
You need to use CP1252 not UTF-8 to read the data. It tells you how
to do so on the help page ... under 'encoding'. So something like
A <- read.table(con <- file("myfile", encoding="CP1252"));close(con)
Please don't cross-post ... I am being brief because you did.
On Fri, 16 Jan 2009, Gustaf Rydevik wrote:
Hi,
I ran into this issue previously and managed to solve it, but I've
forgotten how and am getting frustrated...
I have a data frame (see below) with scandinavian characters in R
(2.7.1) running on a Win Xp-computer. I save the data frame in an
RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
10.5. Now the name of the data frame and all factor labels with
scandinavian characters are scrambled. How do I make R in OS X read my
data frame?
From what I've managed to find in the list archives and the FAQ I either
1) run
Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything
or
2) run
defaults write org.R-project.R force.LANG en_US.UTF-8
in the terminal, which doesn't help either.
I must admit that I couldn't quite follow what documentation i found
on locales, so I might have messed up somewhere along the line.
Many thanks in advance for your help!
Regards,
Gustaf
--------
L?nkarta <-
structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
"H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
"G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
"Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
"Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n",
"V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
"V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
"factor")), .Names = c("LANKOD",
"L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20"))
--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
It displays sensibly (at least I think so, not being a reader of any
Scandinavian language) on my Mac (10.5.6).
I think that is because your email client re-encoded it (as did mine),
always a hazard of email. It was marked as iso-8859-1. Email, unlike
text files, can have the encoding marked.
LANKOD L?n
0 K Blekinge l?n
1 X G?vleborgs l?n
2 I Gotlands l?n
3 N Hallands l?n
4 Z J?mtlands l?n
5 F J?nk?pings l?n
6 H Kalmar l?n
7 W Dalarnas l?n
8 G Kronobergs l?n
9 BD Norrbottens l?n
10 T ?rebro l?n
11 E ?sterg?tlands l?n
12 D S?dermanlands l?n
13 C Uppsala l?n
14 S V?rmlands l?n
15 AC V?sterbottens l?n
16 Y V?sternorrlands l?n
17 U V?stmanlands l?n
18 AB Stockholms l?n
19 O V?stra G?talands l?n
20 M Sk?ne l?n
sessionInfo()
R version 2.8.1 Patched (2009-01-07 r47515)
i386-apple-darwin9.6.0
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
(An etiquette note: It is considered impolite to cross post to both the
r-help and r-sig-mac lists.)
Not just impolite, inconsiderate of the time and resources of others:
you are asked not to do so on the mailing lists top page.
--
David Winsemius
On Jan 16, 2009, at 8:13 AM, Gustaf Rydevik wrote:
Hi,
I ran into this issue previously and managed to solve it, but I've
forgotten how and am getting frustrated...
I have a data frame (see below) with scandinavian characters in R
(2.7.1) running on a Win Xp-computer. I save the data frame in an
RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
10.5. Now the name of the data frame and all factor labels with
scandinavian characters are scrambled. How do I make R in OS X read my
data frame?
From what I've managed to find in the list archives and the FAQ I either
1) run
Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything
or
2) run
defaults write org.R-project.R force.LANG en_US.UTF-8
in the terminal, which doesn't help either.
I must admit that I couldn't quite follow what documentation i found
on locales, so I might have messed up somewhere along the line.
Many thanks in advance for your help!
Regards,
Gustaf
--------
L?nkarta <-
structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
"H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
"G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
"Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
"Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands l?n",
"V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
"V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
"factor")), .Names = c("LANKOD",
"L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20"))
--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
Reading the help page for Sys.get/set/locale:
"Attempts to change the character set (by Sys.setlocale("LC_TYPE", ),
if that implies a different character set) during a session may not
work and are likely to lead to some confusion.
Value
A character string of length one describing the locale in use (after
setting for Sys.setlocale), or an empty character string if the
current locale settings are invalid or NULL if locale information is
unavailable.
For category = "LC_ALL" the details of the string are system-specific:
it might be a single locale name or a set of locale names separated by
"/"(Solaris, Mac OS X) or ";" (Windows, Linux). For portability, it is
best to query categories individually: it is not necessarily the case
that the result of foo <- Sys.getlocale() can be used in
Sys.setlocale("LC_ALL", locale = foo).'
I interpret that as saying that if you use "LC_ALL", then you need to
pass a character string to Sys.setlocale() that is constructed
properly for a Mac and that it might have "/"'s. And you need to do it
at the beginning of a session. And that it will be ignored, as you say
"not do anything" if not precisely correct. This is what Sys.getlocale
returns on mine:
"en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8"
Hope this helps;
David Winsemius
On Jan 16, 2009, at 8:44 AM, David Winsemius wrote:
It displays sensibly (at least I think so, not being a reader of any
Scandinavian language) on my Mac (10.5.6).
snip
sessionInfo()
R version 2.8.1 Patched (2009-01-07 r47515)
i386-apple-darwin9.6.0
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
(An etiquette note: It is considered impolite to cross post to both
the r-help and r-sig-mac lists.)
--
David Winsemius
On Jan 16, 2009, at 8:13 AM, Gustaf Rydevik wrote:
Hi,
I ran into this issue previously and managed to solve it, but I've
forgotten how and am getting frustrated...
I have a data frame (see below) with scandinavian characters in R
(2.7.1) running on a Win Xp-computer. I save the data frame in an
RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
10.5. Now the name of the data frame and all factor labels with
scandinavian characters are scrambled. How do I make R in OS X read
my
data frame?
From what I've managed to find in the list archives and the FAQ I
either
1) run
Sys.setlocale("LC_ALL","en_US.UTF-8") ### Doesn't change anything
or
2) run
defaults write org.R-project.R force.LANG en_US.UTF-8
in the terminal, which doesn't help either.
I must admit that I couldn't quite follow what documentation i found
on locales, so I might have messed up somewhere along the line.
Many thanks in advance for your help!
Regards,
Gustaf
--------
L?nkarta <-
structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
14L, 12L), .Label = c("AB", "AC", "BD", "C", "D", "E", "F", "G",
"H", "I", "K", "M", "N", "O", "S", "T", "U", "W", "X", "Y", "Z"
), class = "factor"), L?n = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
19L, 11L), .Label = c("Blekinge l?n", "Dalarnas l?n", "Gotlands l?n",
"G?vleborgs l?n", "Hallands l?n", "J?mtlands l?n", "J?nk?pings l?n",
"Kalmar l?n", "Kronobergs l?n", "Norrbottens l?n", "Sk?ne l?n",
"Stockholms l?n", "S?dermanlands l?n", "Uppsala l?n", "V?rmlands
l?n",
"V?sterbottens l?n", "V?sternorrlands l?n", "V?stmanlands l?n",
"V?stra G?talands l?n", "?rebro l?n", "?sterg?tlands l?n"), class =
"factor")), .Names = c("LANKOD",
"L?n"), class = "data.frame", row.names = c("0", "1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20"))
--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
On Fri, Jan 16, 2009 at 2:53 PM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
On Fri, 16 Jan 2009, David Winsemius wrote:
It displays sensibly (at least I think so, not being a reader of any
Scandinavian language) on my Mac (10.5.6).
I think that is because your email client re-encoded it (as did mine),
always a hazard of email. It was marked as iso-8859-1. Email, unlike text
files, can have the encoding marked.
Thank you for your help, and I apologise for crossposting previously.
I've previously figured out how to solve this issue when using read.table(),
but in this case I was using save() and load() on the dataframe,
inbedding it in a workspave- is there a way to tell load() that the
workspace to be loaded was created with a specific encoding?
Regards,
Gustaf
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
On Jan 16, 2009, at 8:48 AM, Prof Brian Ripley wrote:
You need to use CP1252 not UTF-8 to read the data. It tells you how
to do so on the help page ... under 'encoding'. So something like
A <- read.table(con <- file("myfile", encoding="CP1252"));close(con)
Please don't cross-post ... I am being brief because you did.
snip
Realizing that it might not work in all situations, would it give
(possibly) useful results to assign the incorrect encoding found in
Gustaf's email, which nonetheless was interpreted sensibly,
"iso-8859-1", to the encoding string?
Reading the help page for Sys.get/set/locale:
"Attempts to change the character set (by Sys.setlocale("LC_TYPE", ), if that
implies a different character set) during a session may not work and are
likely to lead to some confusion.
Value
A character string of length one describing the locale in use (after setting
for Sys.setlocale), or an empty character string if the current locale
settings are invalid or NULL if locale information is unavailable.
For category = "LC_ALL" the details of the string are system-specific: it
might be a single locale name or a set of locale names separated by
"/"(Solaris, Mac OS X) or ";" (Windows, Linux). For portability, it is best
to query categories individually: it is not necessarily the case that the
result of foo <- Sys.getlocale() can be used in Sys.setlocale("LC_ALL",
locale = foo).'
I interpret that as saying that if you use "LC_ALL", then you need to pass a
character string to Sys.setlocale() that is constructed properly for a Mac
and that it might have "/"'s.
Actually, it says the opposite: the output you get is not necessarily
valid input.
And you need to do it at the beginning of a
session. And that it will be ignored, as you say "not do anything" if not
precisely correct. This is what Sys.getlocale returns on mine:
"en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8"
However, to set it, just en_US works (Mac locales are by default in
UTF-8). In Swedish, you can have:
tystie% locale -a | grep SE
sv_SE
sv_SE.ISO8859-1
sv_SE.ISO8859-15
sv_SE.UTF-8
and setting one of the middle two would have worked.
Annoyingly, Mac OS does not tell you which is which in the locales
settings list, so it is basically useless. I believe they are
alphabetic (in the C locale) order since the Mac only has 6
categories.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
On Jan 16, 2009, at 8:48 AM, Prof Brian Ripley wrote:
You need to use CP1252 not UTF-8 to read the data. It tells you how to do
so on the help page ... under 'encoding'. So something like
A <- read.table(con <- file("myfile", encoding="CP1252"));close(con)
Please don't cross-post ... I am being brief because you did.
snip
Realizing that it might not work in all situations, would it give (possibly)
useful results to assign the incorrect encoding found in Gustaf's email,
which nonetheless was interpreted sensibly, "iso-8859-1", to the encoding
string?
Yes: CP1252 is a superset of ISO-8859-1. I knew because only one
encoding can be used for Swedish on Windows (unlike most other OSes
where there are three -- see my second posting).
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595