Skip to content
Prev 222379 / 398500 Next

difference in sort order linux/Windows (R.2.11.0)

It would seem that there is indeed a locale effect. Revisiting the
examples I used on Linux in a previous post, at which time I was
using the default "LC_COLLATE=en_GB.UTF-8", I changed this to "C".
Both the "C" and the "en_GB.UTF-8" are indicated (the latter copied
from my previous post):

  Sys.setlocale("LC_COLLATE", "C")
  # [1] "C"
  sort(c("AB CD","ABCD"))
  # [1] "AB CD" "ABCD"       ## (C)
  # [1] "ABCD"  "AB CD"      ## (en_GB.UTF-8)
  sort(c("AB CD","ABCD "))
  # [1] "AB CD" "ABCD "      ## (C)
  # [1] "AB CD" "ABCD "      ## (en_GB.UTF-8)

So the "C" ordering comes out as one would expect in either case,
while the "en_GB.UTF-8" ordering does not in the first case (where
the two strings are of different lengths).

Is there any way to extract the numerical encoding of a character
string (according to the collating locale encoding) to which the
comparison in the sort() algorithm is applied?

Ted.
On 28-May-10 11:07:57, Joris Meys wrote:
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10                                       Time: 12:49:19
------------------------------ XFMail ------------------------------