Skip to content
Prev 222546 / 398500 Next

difference in sort order linux/Windows (R.2.11.0)

On Fri, 28 May 2010 01:17:49 -0700 (PDT)
carslaw <david.carslaw at kcl.ac.uk> wrote:

            
This is a lexical sort. Depending on the locale the
items may not sort in ASCII order. For example, a 
European-latin locale may have some letters in 
different places than ASCII. You have to check 
what is being sorted (e.g., map the stuff to UTF8
binary).

You might also find that input generated on windog
has "smart spaces" in it from the generating program
(e.g., Excell) that are something like \xA0 instead
of \x20 (32d) used in ASCII spaces.

Suggestion: Validate the data with something like
"od -cx" on linux so you know what you are sorting.
Then dump it out as hex in R [sorry, I have no idea
how to do that] and see if what you are sorting 
matches. After that validate the LOCALE setting
on both sides. If all of those turn up the same 
raw data then you've found a bug in R -- or at least
need to read some fine print in the lexical sort
docs.