difference in sort order linux/Windows (R.2.11.0)
carslaw wrote:
Dear R users, I'm a bit perplexed with the effect sort has here, as it is different on Windows vs. linux. It makes my factor levels and subsequent plots different on the two systems.
You are using different collation orders. On Linux, your sessionInfo shows en_GB.utf8 while Windows shows English_United Kingdom.1252 so you should be prepared for differences. That said, it certainly looks as though the string comparison is wrong on Linux. Using Ted Harding's examples, I get these results: > "AB CD" > "ABCD" [1] FALSE > "AB CD" > "ABCD " [1] FALSE on Windows in the English_Canada.1252 locale and on Linux in the C locale. However, when I use the locale that's default on our system, en_US.UTF-8, I get > "AB CD" > "ABCD" [1] TRUE > "AB CD" > "ABCD " [1] FALSE as Ted did, and that certainly looks wrong. Duncan Murdoch
Given:
types <- c("PC-D-Euro-0", "PC-D-Euro-1", "PC-D-Euro-2", "PC-D-Euro-3",
"PC-D-Euro-4", "PC-D-Euro-5", "PC-D-Euro-6", "LCV-D-Euro-0",
"LCV-D-Euro-1", "LCV-D-Euro-2", "LCV-D-Euro-3", "LCV-D-Euro-4",
"LCV-D-Euro-5", "LCV-D-Euro-6", "HGV-D-Euro-0", "HGV-D-Euro-I",
"HGV-D-Euro-II", "HGV-D-Euro-III", "HGV-D-Euro-IV EGR", "HGV-D-Euro-IV SCR",
"HGV-D-Euro-IV SCRb", "HGV-D-Euro-V EGR", "HGV-D-Euro-V SCR",
"HGV-D-Euro-V SCRb", "HGV-D-Euro-VI", "HGV-D-Euro-VIb")
On linux, sort does:
sort(types)
[1] "HGV-D-Euro-0" "HGV-D-Euro-I" "HGV-D-Euro-II"
[4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR"
[7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-VI"
[10] "HGV-D-Euro-VIb" "HGV-D-Euro-V SCR" "HGV-D-Euro-V SCRb"
[13] "LCV-D-Euro-0" "LCV-D-Euro-1" "LCV-D-Euro-2"
[16] "LCV-D-Euro-3" "LCV-D-Euro-4" "LCV-D-Euro-5"
[19] "LCV-D-Euro-6" "PC-D-Euro-0" "PC-D-Euro-1"
[22] "PC-D-Euro-2" "PC-D-Euro-3" "PC-D-Euro-4"
[25] "PC-D-Euro-5" "PC-D-Euro-6"
And on Windows:
sort(types)
[1] "HGV-D-Euro-0" "HGV-D-Euro-I" "HGV-D-Euro-II"
[4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR"
[7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-V SCR"
[10] "HGV-D-Euro-V SCRb" "HGV-D-Euro-VI" "HGV-D-Euro-VIb"
[13] "LCV-D-Euro-0" "LCV-D-Euro-1" "LCV-D-Euro-2"
[16] "LCV-D-Euro-3" "LCV-D-Euro-4" "LCV-D-Euro-5"
[19] "LCV-D-Euro-6" "PC-D-Euro-0" "PC-D-Euro-1"
[22] "PC-D-Euro-2" "PC-D-Euro-3" "PC-D-Euro-4"
[25] "PC-D-Euro-5" "PC-D-Euro-6"
Session info for both systems is below. The order I actually want is the
Windows one, but looking at it,
the linux order is perhaps more intuitive. However, the problem is the
order is inconsistent between
the two systems. Any suggestions?
sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-pc-linux-gnu
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=en_GB.utf8
[9] LC_ADDRESS=en_GB.utf8 LC_TELEPHONE=en_GB.utf8
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=en_GB.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rkward_0.5.3
loaded via a namespace (and not attached):
[1] tools_2.11.0
sessionInfo()
R version 2.11.0 (2010-04-22) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Dr David Carslaw King's College London Environmental Research Group Franklin Wilkins Building 150 Stamford Street London SE1 9NH