a question of alphabetical order
Tricky question, this order issue :-( Thank you so much for the detailed explanation. Thus, please, must I conclude that I will have to survive with this ASCII order while working in Mac OS X 10.5.2 until Mac people fix this bug? You spoke about es_ES.ISO8859-15 in Mac. Will it do the trick? Yes, as far as I understand. But as I am using R.app, locale is set by the system preferences. Truly, I am kind of a mess with this issue. Could I force es_ES.ISO8859-15 as a locale in the Mac. Sorry of I put another question here... why does Excel order list correctly? I guess it doesn't relies on Mac settings. As a R newbie I must recognize that this, and others, behaviours are really hard to deal with. But I've seen, an even done, such an amount of wonderful things with R that it is worth all efforts. Thanks for your help. All the best, Ricardo
Prof Brian Ripley wrote:
This is a known Mac OS X bug, nothing to do with R which uses the
system functions (strcoll/wcscoll) for such things.
If you look at the help for sort, it refers you to ?Comparison. Which
says
Comparison of strings in character vectors is lexicographic within
the strings using the collating sequence of the locale in use: see
'locales'. The collating sequence of locales such as 'en_US' is
normally different from 'C' (which should use ASCII) and can be
surprising. Beware of making _any_ assumptions about the
collation order: e.g. in Estonian 'Z' comes between 'S' and 'T',
and collation is not necessarily character-by-character - in
Danish 'aa' sorts as a single letter, after 'z'. Some platforms
may not respect the locale and always sort in ASCII. (String
comparison is always for the part of the string up to the first
nul if there are embedded nuls.)
Mac OS X (more specifically, 10.5.2 on i386) is one of those
disrespectful platforms.
x <- intToUtf8(c(32:127, 160:255), multiple=T) order(x)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 [181] 181 182 183 184 185 186 187 188 189 190 191 192 which is quite different from Linux or Solaris. This may not come out, but paste(sort(x), collapse="") includes aA???????????????bBcC??dDeE???????? on Linux in es_ES.utf8 . Platforms are a lot worse at sorting in UTF-8 than 8-bit encodings. Mac OS X has es_ES.ISO8859-15, and that does do a reasonable job including a??????? .
Ricardo Rodr?guez Your XEN ICT Team