difference in sort order linux/Windows (R.2.11.0)
It would seem that there is indeed a locale effect. Revisiting the
examples I used on Linux in a previous post, at which time I was
using the default "LC_COLLATE=en_GB.UTF-8", I changed this to "C".
Both the "C" and the "en_GB.UTF-8" are indicated (the latter copied
from my previous post):
Sys.setlocale("LC_COLLATE", "C")
# [1] "C"
sort(c("AB CD","ABCD"))
# [1] "AB CD" "ABCD" ## (C)
# [1] "ABCD" "AB CD" ## (en_GB.UTF-8)
sort(c("AB CD","ABCD "))
# [1] "AB CD" "ABCD " ## (C)
# [1] "AB CD" "ABCD " ## (en_GB.UTF-8)
So the "C" ordering comes out as one would expect in either case,
while the "en_GB.UTF-8" ordering does not in the first case (where
the two strings are of different lengths).
Is there any way to extract the numerical encoding of a character
string (according to the collating locale encoding) to which the
comparison in the sort() algorithm is applied?
Ted.
On 28-May-10 11:07:57, Joris Meys wrote:
Pretty obvious: You use different locales (collate). What happens if you use the same on both machines? Cheers Joris On Fri, May 28, 2010 at 10:17 AM, carslaw <david.carslaw at kcl.ac.uk> wrote:
Dear R users, I'm a bit perplexed with the effect sort has here, as it is different on ... the linux order is perhaps more intuitive. However, the problem is the order is inconsistent between the two systems. Any suggestions? sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=en_GB.utf8 [9] LC_ADDRESS=en_GB.utf8 LC_TELEPHONE=en_GB.utf8 [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=en_GB.utf8 ...
sessionInfo()
R version 2.11.0 (2010-04-22) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 ... Dr David Carslaw
-- Joris Meys
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 28-May-10 Time: 12:49:19 ------------------------------ XFMail ------------------------------