Hey, I found this - for me - quite surprising and puzzling behaviour of split(). split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of input == sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens when using split, I did not find any note on this - for me - rather unexpected behaviour. I would like it best when the sorting of split results stays the same no matter the input (sorting of input == sorting of output) If that is not possibly a note of caution in the help pages and maybe an example might be valuable. Best, Peter
split() - unexpected sorting of results
6 messages · Iñaki Ucar, Peter Meissner, Hervé Pagès +1 more
Hi Peter, 2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
Hey, I found this - for me - quite surprising and puzzling behaviour of split(). split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of input == sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens when using split, I did not find any note on this - for me - rather unexpected behaviour.
As the documentation states,
f: a ?factor? in the sense that ?as.factor(f)? defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
And, in fact,
as.factor(1:11)
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 2 3 4 5 6 7 8 9 10 11
as.factor(as.character(1:11))
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 Regards, I?aki
I would like it best when the sorting of split results stays the same no
matter the input (sorting of input == sorting of output)
If that is not possibly a note of caution in the help pages and maybe an
example might be valuable.
Best, Peter
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks, for the explanation. Still, I think this is surprising bahaviour which might be handled better. Best, Peter Am 20.10.2017 9:49 nachm. schrieb "I?aki ?car" <i.ucar86 at gmail.com>:
Hi Peter, 2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
Hey, I found this - for me - quite surprising and puzzling behaviour of
split().
split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of
input
== sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens
when
using split, I did not find any note on this - for me - rather unexpected behaviour.
As the documentation states,
f: a ?factor? in the sense that ?as.factor(f)? defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
And, in fact,
as.factor(1:11)
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 2 3 4 5 6 7 8 9 10 11
as.factor(as.character(1:11))
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 Regards, I?aki
I would like it best when the sorting of split results stays the same no
matter the input (sorting of input == sorting of output)
If that is not possibly a note of caution in the help pages and maybe an
example might be valuable.
Best, Peter
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi,
On 10/20/2017 12:53 PM, Peter Meissner wrote:
Thanks, for the explanation. Still, I think this is surprising bahaviour which might be handled better.
Maybe a little surprising, but no more than: > x <- sample(11L) > sort(x) [1] 1 2 3 4 5 6 7 8 9 10 11 > sort(as.character(x)) [1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9" The fact that sort(), as.factor(), split() and many other things behave consistently with respect to the underlying order of character vectors avoids other even bigger surprises. Also note that the underlying order of character vectors actually depends on your locale. One way to guarantee consistent results across platforms/locales is by explicitly specifying the levels when making a factor e.g. f <- factor(x, levels=unique(x)) split(1:11, f) This is particularly sensible when writing unit tests. Cheers, H.
Best, Peter Am 20.10.2017 9:49 nachm. schrieb "I?aki ?car" <i.ucar86 at gmail.com>:
Hi Peter, 2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
Hey, I found this - for me - quite surprising and puzzling behaviour of
split().
split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of
input
== sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens
when
using split, I did not find any note on this - for me - rather unexpected behaviour.
As the documentation states,
f: a ?factor? in the sense that ?as.factor(f)? defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
And, in fact,
as.factor(1:11)
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 2 3 4 5 6 7 8 9 10 11
as.factor(as.character(1:11))
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 Regards, I?aki
I would like it best when the sorting of split results stays the same no
matter the input (sorting of input == sorting of output)
If that is not possibly a note of caution in the help pages and maybe an
example might be valuable.
Best, Peter
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hello, In order to solve that problem of sorting numerics made characters there is package stringr, functions str_sort and str_order. library(stringr) set.seed(2447) x <- sample(11L) sort(as.character(x)) [1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9" str_sort(as.character(x), numeric = TRUE) [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" str_order(as.character(x), numeric = TRUE) #[1] 1 4 11 8 6 5 3 10 9 7 2 i <- str_order(as.character(x), numeric = TRUE) as.character(x)[i] #[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" Unfortunately this does not solve the OP's question, factor(), as.factor(), split() and others use the base R sorter and this can only be changed by changing their sources. Hope this helps, Rui Barradas Em 21-10-2017 00:32, Herv? Pag?s escreveu:
Hi, On 10/20/2017 12:53 PM, Peter Meissner wrote:
Thanks, for the explanation. Still, I think this is surprising bahaviour which might be handled better.
Maybe a little surprising, but no more than:
> x <- sample(11L)
> sort(x)
[1] 1 2 3 4 5 6 7 8 9 10 11
> sort(as.character(x))
[1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9" The fact that sort(), as.factor(), split() and many other things behave consistently with respect to the underlying order of character vectors avoids other even bigger surprises. Also note that the underlying order of character vectors actually depends on your locale. One way to guarantee consistent results across platforms/locales is by explicitly specifying the levels when making a factor e.g. f <- factor(x, levels=unique(x)) split(1:11, f) This is particularly sensible when writing unit tests. Cheers, H.
Best, Peter Am 20.10.2017 9:49 nachm. schrieb "I?aki ?car" <i.ucar86 at gmail.com>:
Hi Peter, 2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
Hey, I found this - for me - quite surprising and puzzling behaviour of
split().
split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of
input
== sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens
when
using split, I did not find any note on this - for me - rather unexpected behaviour.
As the documentation states,
f: a ?factor? in the sense that ?as.factor(f)? defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
And, in fact,
as.factor(1:11)
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 2 3 4 5 6 7 8 9 10 11
as.factor(as.character(1:11))
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 Regards, I?aki
I would like it best when the sorting of split results stays the
same no
matter the input (sorting of input == sorting of output)
If that is not possibly a note of caution in the help pages and
maybe an
example might be valuable.
Best, Peter
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=
1 day later
Thank you all for your input - most appreciated. Best, Peter Am 21.10.2017 07:35 schrieb "Rui Barradas" <ruipbarradas at sapo.pt>:
Hello, In order to solve that problem of sorting numerics made characters there is package stringr, functions str_sort and str_order. library(stringr) set.seed(2447) x <- sample(11L) sort(as.character(x)) [1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9" str_sort(as.character(x), numeric = TRUE) [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" str_order(as.character(x), numeric = TRUE) #[1] 1 4 11 8 6 5 3 10 9 7 2 i <- str_order(as.character(x), numeric = TRUE) as.character(x)[i] #[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" Unfortunately this does not solve the OP's question, factor(), as.factor(), split() and others use the base R sorter and this can only be changed by changing their sources. Hope this helps, Rui Barradas Em 21-10-2017 00:32, Herv? Pag?s escreveu:
Hi, On 10/20/2017 12:53 PM, Peter Meissner wrote:
Thanks, for the explanation. Still, I think this is surprising bahaviour which might be handled better.
Maybe a little surprising, but no more than:
> x <- sample(11L)
> sort(x)
[1] 1 2 3 4 5 6 7 8 9 10 11
> sort(as.character(x))
[1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9" The fact that sort(), as.factor(), split() and many other things behave consistently with respect to the underlying order of character vectors avoids other even bigger surprises. Also note that the underlying order of character vectors actually depends on your locale. One way to guarantee consistent results across platforms/locales is by explicitly specifying the levels when making a factor e.g. f <- factor(x, levels=unique(x)) split(1:11, f) This is particularly sensible when writing unit tests. Cheers, H.
Best, Peter Am 20.10.2017 9:49 nachm. schrieb "I?aki ?car" <i.ucar86 at gmail.com>: Hi Peter,
2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
Hey, I found this - for me - quite surprising and puzzling behaviour of
split().
split(1:11, as.character(1:11)) split(1:11, 1:11) When splitting by numerics everything works as expected - sorting of
input
== sorting of output -- but when using a character vector everything gets re-sorted alphabetical. Although, there are some references in the help files to what happens
when
using split, I did not find any note on this - for me - rather unexpected behaviour.
As the documentation states,
f: a ?factor? in the sense that ?as.factor(f)? defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
And, in fact,
as.factor(1:11)
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 2 3 4 5 6 7 8 9 10 11 as.factor(as.character(1:11))
[1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 Regards, I?aki I would like it best when the sorting of split results stays the
same no
matter the input (sorting of input == sorting of output)
If that is not possibly a note of caution in the help pages and
maybe an
example might be valuable.
Best, Peter
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et hz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84V tBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZ T7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCD PAclXHoc9_le3Z1DrZg0nQqg&e=
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et hz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84V tBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZ T7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCD PAclXHoc9_le3Z1DrZg0nQqg&e=