Hi all, I am having difficulties to understand how R sort strings:
If I do
R) sort(c("X.","X0B"))
[1] "X." "X0B"
So for me, as far as lexicographic order is concerned I can add whatever to
the end, the order will remain the same, but :
R) sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z"
Can somebody give me a trick for the order to become lexicographic ?
--
View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4403696.html
Sent from the R help mailing list archive at Nabble.com.
Sorting strings
16 messages · statquant2, Keith Jewell, Enrico Schumann +5 more
On Mon, Feb 20, 2012 at 02:18:42AM -0800, statquant2 wrote:
Hi all, I am having difficulties to understand how R sort strings:
If I do
R) sort(c("X.","X0B"))
[1] "X." "X0B"
So for me, as far as lexicographic order is concerned I can add whatever to
the end, the order will remain the same, but :
Hi. This neednot be true for strings of different length. For example ab abc become by concatenation with z abcz abz Petr Savicky.
"Petr Savicky" <savicky at cs.cas.cz> wrote in message news:20120220105153.GC21422 at cs.cas.cz...
On Mon, Feb 20, 2012 at 02:18:42AM -0800, statquant2 wrote:
Hi all, I am having difficulties to understand how R sort strings:
If I do
R) sort(c("X.","X0B"))
[1] "X." "X0B"
So for me, as far as lexicographic order is concerned I can add whatever
to
the end, the order will remain the same, but :
Hi. This neednot be true for strings of different length. For example ab abc become by concatenation with z abcz abz Petr Savicky.
That's not the explanation in this case. The OP isn't telling us everything. I get [R version 2.14.1 Platform: i386-pc-mingw32/i386 (32-bit)]:
sort(c("X.","X0B"))
[1] "X." "X0B"
sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z" KJ
Ok so it changed from 2.12.2 to 2.14.1 ?? Can somebody tell me how to modify my sort or whatever to get the save resilt that I would get in 2.14.1 ? Cheers -- View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4403858.html Sent from the R help mailing list archive at Nabble.com.
See ?Comparison, which holds some warnings about what to expect when sorting strings. Am 20.02.2012 11:51, schrieb Petr Savicky:
On Mon, Feb 20, 2012 at 02:18:42AM -0800, statquant2 wrote:
Hi all, I am having difficulties to understand how R sort strings:
If I do
R) sort(c("X.","X0B"))
[1] "X." "X0B"
So for me, as far as lexicographic order is concerned I can add whatever to
the end, the order will remain the same, but :
Hi. This neednot be true for strings of different length. For example ab abc become by concatenation with z abcz abz Petr Savicky.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Enrico Schumann Lucerne, Switzerland http://nmof.net/
I did, but this does not give the answer to my question... Anybody knows how to tweack the behaviour of sort or how to do ? -- View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404091.html Sent from the R help mailing list archive at Nabble.com.
I don't *think* it's version specific, but rather it depends on your (still unstated) locale, as the documentation goes to great lengths to point out. Change that and you might see different behaviors. Michael
On Mon, Feb 20, 2012 at 8:55 AM, statquant2 <statquant at gmail.com> wrote:
I did, but this does not give the answer to my question... Anybody knows how to tweack the behaviour of sort or how to do ? -- View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404091.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, statquant2 wrote
Ok so it changed from 2.12.2 to 2.14.1 ?? Can somebody tell me how to modify my sort or whatever to get the save resilt that I would get in 2.14.1 ? Cheers
I don't know about 2.12.2 but for 2.12.0 I get:
R.version
_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 12.0 year 2010 month 10 day 15 svn rev 53317 language R version.string R version 2.12.0 (2010-10-15)
sort(c("X.","X0B"))
[1] "X." "X0B"
sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z" And the same for 2.14.1:
R.version
_ platform i386-pc-mingw32 [... deleted...] version.string R version 2.14.1 (2011-12-22)
sort(c("X.","X0B"))
[1] "X." "X0B"
sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z" Could it be OS related? Rui Barradas. -- View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404267.html Sent from the R help mailing list archive at Nabble.com.
Ok I have :
R) str(R.Version())
List of 13
$ platform : chr "x86_64-unknown-linux-gnu"
$ arch : chr "x86_64"
$ os : chr "linux-gnu"
$ system : chr "x86_64, linux-gnu"
$ status : chr ""
$ major : chr "2"
$ minor : chr "12.2"
$ year : chr "2011"
$ month : chr "02"
$ day : chr "25"
$ svn rev : chr "54585"
$ language : chr "R"
$ version.string: chr "R version 2.12.2 (2011-02-25)"
R) sort(c("X.","X0B"))
[1] "X." "X0B"
R) sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z"
I am using a linux redHat
$ uname -a
Linux 2.6.18-238.9.1.el5 #1 SMP Fri Mar 18 12:42:39 EDT 2011 x86_64 x86_64
x86_64 GNU/Linux
--
View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404298.html
Sent from the R help mailing list archive at Nabble.com.
On Mon, Feb 20, 2012 at 05:55:30AM -0800, statquant2 wrote:
I did, but this does not give the answer to my question... Anybody knows how to tweack the behaviour of sort or how to do ?
Hi.
Try this
Sys.setlocale("LC_COLLATE", "C")
This comes from ?locale and reads there
Sys.setlocale("LC_COLLATE", "C") # turn off locale-specific sorting,
# usually
See also ?sort
The sort order for character vectors will depend on the collating
sequence of the locale in use: see ?Comparison?.
?Comparison
Comparison of strings in character vectors is lexicographic within
the strings using the collating sequence of the locale in use: see
?locales?. The collating sequence of locales such as ?en_US? is
normally different from ?C? (which should use ASCII) and can be
surprising. Beware of making _any_ assumptions about the
collation order: ...
Hope this helps.
Petr Savicky.
NICE DUUUUDE It solves my problem ! Awesome stuff -- View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404424.html Sent from the R help mailing list archive at Nabble.com.
It seems OS-dependent. I got different results when trying it on windows
xp and Redhat linux.
> R.version
_
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 9.1
year 2009
month 06
day 26
svn rev 48839
language R
version.string R version 2.9.1 (2009-06-26)
> sort(c("X.","X0B"))
[1] "X." "X0B"
> sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z"
> R.version
_
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 9.1
year 2009
month 06
day 26
svn rev 48839
language R
version.string R version 2.9.1 (2009-06-26)
> sort(c("X.","X0B"))
[1] "X." "X0B"
> sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z"
On 2012-2-20 23:27, statquant2 wrote:
Ok I have :
R) str(R.Version())
List of 13
$ platform : chr "x86_64-unknown-linux-gnu"
$ arch : chr "x86_64"
$ os : chr "linux-gnu"
$ system : chr "x86_64, linux-gnu"
$ status : chr ""
$ major : chr "2"
$ minor : chr "12.2"
$ year : chr "2011"
$ month : chr "02"
$ day : chr "25"
$ svn rev : chr "54585"
$ language : chr "R"
$ version.string: chr "R version 2.12.2 (2011-02-25)"
R) sort(c("X.","X0B"))
[1] "X." "X0B"
R) sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z"
I am using a linux redHat
$ uname -a
Linux 2.6.18-238.9.1.el5 #1 SMP Fri Mar 18 12:42:39 EDT 2011 x86_64 x86_64
x86_64 GNU/Linux
--
View this message in context: http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404298.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, Feb 20, 2012 at 04:56:21PM +0100, Petr Savicky wrote:
On Mon, Feb 20, 2012 at 05:55:30AM -0800, statquant2 wrote:
I did, but this does not give the answer to my question... Anybody knows how to tweack the behaviour of sort or how to do ?
Hi.
Try this
Sys.setlocale("LC_COLLATE", "C")
This comes from ?locale and reads there
This is not in ?locale, but in ?locales
Sys.setlocale("LC_COLLATE", "C") # turn off locale-specific sorting,
# usually
This in the example section at the end.
Try also to see
Sys.getlocale()
Relevant can also be LC_CTYPE
Sys.setlocale("LC_CTYPE", "C")
Hope this helps.
Petr Savicky.
Sorry, just made a mistake. This is the result from windows xp.
> sort(c("X.","X0B"))
[1] "X." "X0B"
> sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z"
> R.version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 13.0
year 2011
month 04
day 13
svn rev 55427
language R
version.string R version 2.13.0 (2011-04-13)
On 2012-2-21 0:13, De-Jian Zhao wrote:
It seems OS-dependent. I got different results when trying it on windows xp and Redhat linux.
R.version
_ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 9.1 year 2009 month 06 day 26 svn rev 48839 language R version.string R version 2.9.1 (2009-06-26)
sort(c("X.","X0B"))
[1] "X." "X0B"
sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z"
R.version
_ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 9.1 year 2009 month 06 day 26 svn rev 48839 language R version.string R version 2.9.1 (2009-06-26)
sort(c("X.","X0B"))
[1] "X." "X0B"
sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z" On 2012-2-20 23:27, statquant2 wrote:
Ok I have :
R) str(R.Version())
List of 13
$ platform : chr "x86_64-unknown-linux-gnu"
$ arch : chr "x86_64"
$ os : chr "linux-gnu"
$ system : chr "x86_64, linux-gnu"
$ status : chr ""
$ major : chr "2"
$ minor : chr "12.2"
$ year : chr "2011"
$ month : chr "02"
$ day : chr "25"
$ svn rev : chr "54585"
$ language : chr "R"
$ version.string: chr "R version 2.12.2 (2011-02-25)"
R) sort(c("X.","X0B"))
[1] "X." "X0B"
R) sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z"
I am using a linux redHat
$ uname -a
Linux 2.6.18-238.9.1.el5 #1 SMP Fri Mar 18 12:42:39 EDT 2011 x86_64
x86_64
x86_64 GNU/Linux
--
View this message in context:
http://r.789695.n4.nabble.com/Sorting-strings-tp4403696p4404298.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 2012-2-20 23:15, Rui Barradas wrote:
Could it be OS related?
Yes, it seems. I tried it on my local windows xp and redhat linux
server, and got different results. Hope it will be fixed in the future
versions. Maybe we should keep alert to check whether the results are
consistent when transferring our code from one platform to another.
> sort(c("X.","X0B"))
[1] "X." "X0B"
> sort(c("X.Z","X0B.Z"))
[1] "X0B.Z" "X.Z"
> R.version
_
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 9.1
year 2009
month 06
day 26
svn rev 48839
language R
version.string R version 2.9.1 (2009-06-26)
> sort(c("X.","X0B"))
[1] "X." "X0B"
> sort(c("X.Z","X0B.Z"))
[1] "X.Z" "X0B.Z"
> R.version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 13.0
year 2011
month 04
day 13
svn rev 55427
language R
version.string R version 2.13.0 (2011-04-13)
On 20-Feb-2012 Petr Savicky wrote:
On Mon, Feb 20, 2012 at 05:55:30AM -0800, statquant2 wrote:
I did, but this does not give the answer to my question... Anybody knows how to tweack the behaviour of sort or how to do ?
Hi.
Try this
Sys.setlocale("LC_COLLATE", "C")
This comes from ?locale and reads there
Sys.setlocale("LC_COLLATE", "C") # turn off locale-specific sorting,
# usually
See also ?sort
The sort order for character vectors will depend on the
collating sequence of the locale in use: see 'Comparison'.
?Comparison
Comparison of strings in character vectors is lexicographic
within the strings using the collating sequence of the locale
in use: see 'locales'. The collating sequence of locales such
as 'en_US' is normally different from 'C' (which should use
ASCII) and can be surprising. Beware of making _any_ assumptions
about the collation order: ...
Hope this helps.
Petr Savicky.
I've been following this thread with interest. I had begun composing
a reply on similar lines to Petr's above, but put it on one side
while waiting to see how the thread would evolve.
In view of the tangle of mixed experiences reported by different
users, I now wonder whether we should have something like "lc_collate"
as a specific parameter for sort(), e.g. so that one can set, for a
particular sorting operation,
sort(c("X.","X0B"),lc_collate="C")
without affecting the system "LC_COLLATE" setting (i.e. the change
takes effect only within the execution of that sort() command).
Ted.
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 20-Feb-2012 Time: 17:16:47
This message was sent by XFMail