Dear all, My git2rdata package relies on a stable sorting. I've noticed that some characters get a different position under R-devel under Windows 10. This is why the unit test of my package only fail in this combination (https://cran.r-project.org/web/checks/check_results_git2rdata.html) Below is a minimal example to illustrate the problem. Best regards, Thierry data <- readLines("https://raw.githubusercontent.com/ropensci/git2rdata/master/tests/testthat/test_b_special.R", encoding = "UTF-8", n = 15) eval(parse(text = paste(tail(data, -3), collapse = ""))) ds$a <- enc2utf8(ds$a) print(ds$a) # input Sys.setlocale(locale = "C") print(sort(ds$a)) # sorted print(order(ds$a)) # order print(sessionInfo()) # input ## Win 10 R 4.0.2 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Win 10 R devel [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Ubuntu 18.04 R 4.0.3 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "?" "|" "#" "@" "$" # sorted ## Win 10 R 4.0.2 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>" "<U+00E7>" [15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb" "a\tb\tc" [22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" ## Win 10 R devel [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "@" "a" "a\t" [15] "a\tb" "a\tb\tc" "a\n" "a\nb" "a\nb\nc" "a b" "a b c" [22] "a\"" "a\"b" "a\"b\"c" "a'" "a'b" "a'b'c" "|" [29] "\200" "\265" "\340" "\347" "\351" ## Ubuntu 18.04 R 4.0.3 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>" "<U+00E7>" [15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb" "a\tb\tc" [22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" # order ## Win 10 R 4.0.2 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10 7 8 2 [26] 21 14 11 12 19 16 17 31 24 ## Win 10 R devel [1] 5 9 22 13 15 32 34 26 23 18 20 33 1 6 3 4 10 7 8 2 21 14 11 12 19 [26] 16 17 31 30 28 27 29 25 24 ## Ubuntu 18.04 R 4.0.3 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10 7 8 2 [26] 21 14 11 12 19 16 17 31 24 R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C system code page: 1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 fortunes_1.5-4 R Under development (unstable) (2021-01-13 r79826) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0 R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=C LC_NUMERIC=C [3] LC_TIME=C LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=nl_BE.UTF-8 [7] LC_PAPER=nl_BE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 fortunes_1.5-4 ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
sorting bug in R-devel?
4 messages · Thierry Onkelinx, Peter Dalgaard, Bob Rudis
Not sure what happened between 4.0.2 and -devel, but you are using C collation, which assumes 7-bit single-byte characters, to sort multi-byte 8-bit encoded characters, which looks a bit risky. -pd
On 19 Jan 2021, at 10:10 , Thierry Onkelinx via R-devel <r-devel at r-project.org> wrote: Dear all, My git2rdata package relies on a stable sorting. I've noticed that some characters get a different position under R-devel under Windows 10. This is why the unit test of my package only fail in this combination (https://cran.r-project.org/web/checks/check_results_git2rdata.html) Below is a minimal example to illustrate the problem. Best regards, Thierry data <- readLines("https://raw.githubusercontent.com/ropensci/git2rdata/master/tests/testthat/test_b_special.R", encoding = "UTF-8", n = 15) eval(parse(text = paste(tail(data, -3), collapse = ""))) ds$a <- enc2utf8(ds$a) print(ds$a) # input Sys.setlocale(locale = "C") print(sort(ds$a)) # sorted print(order(ds$a)) # order print(sessionInfo()) # input ## Win 10 R 4.0.2 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Win 10 R devel [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Ubuntu 18.04 R 4.0.3 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "?" "|" "#" "@" "$" # sorted ## Win 10 R 4.0.2 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>" "<U+00E7>" [15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb" "a\tb\tc" [22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" ## Win 10 R devel [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "@" "a" "a\t" [15] "a\tb" "a\tb\tc" "a\n" "a\nb" "a\nb\nc" "a b" "a b c" [22] "a\"" "a\"b" "a\"b\"c" "a'" "a'b" "a'b'c" "|" [29] "\200" "\265" "\340" "\347" "\351" ## Ubuntu 18.04 R 4.0.3 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>" "<U+00E7>" [15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb" "a\tb\tc" [22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" # order ## Win 10 R 4.0.2 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10 7 8 2 [26] 21 14 11 12 19 16 17 31 24 ## Win 10 R devel [1] 5 9 22 13 15 32 34 26 23 18 20 33 1 6 3 4 10 7 8 2 21 14 11 12 19 [26] 16 17 31 30 28 27 29 25 24 ## Ubuntu 18.04 R 4.0.3 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10 7 8 2 [26] 21 14 11 12 19 16 17 31 24 R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C system code page: 1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 fortunes_1.5-4 R Under development (unstable) (2021-01-13 r79826) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0 R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=C LC_NUMERIC=C [3] LC_TIME=C LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=nl_BE.UTF-8 [7] LC_PAPER=nl_BE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 fortunes_1.5-4 ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Dear Peter, Thanks for the feedback on the locale. Is there a better alternative for the C locale? One that yields a consistent and stable sorting independent of the R version and OS. Best regards, Thierry ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 19 jan. 2021 om 13:20 schreef Peter Dalgaard <pdalgd at gmail.com>:
Not sure what happened between 4.0.2 and -devel, but you are using C collation, which assumes 7-bit single-byte characters, to sort multi-byte 8-bit encoded characters, which looks a bit risky. -pd
On 19 Jan 2021, at 10:10 , Thierry Onkelinx via R-devel <
r-devel at r-project.org> wrote:
Dear all, My git2rdata package relies on a stable sorting. I've noticed that some characters get a different position under R-devel under Windows 10. This is why the unit test of my package only fail in this combination (
Below is a minimal example to illustrate the problem.
Best regards,
Thierry
data <- readLines("
encoding = "UTF-8", n = 15) eval(parse(text = paste(tail(data, -3), collapse = ""))) ds$a <- enc2utf8(ds$a) print(ds$a) # input Sys.setlocale(locale = "C") print(sort(ds$a)) # sorted print(order(ds$a)) # order print(sessionInfo()) # input ## Win 10 R 4.0.2 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t"
"a\nb"
[8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Win 10 R devel [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t"
"a\nb"
[8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Ubuntu 18.04 R 4.0.3 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "?" "|" "#" "@" "$" # sorted ## Win 10 R 4.0.2 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>"
"<U+00E7>"
[15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb"
"a\tb\tc"
[22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" ## Win 10 R devel [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "@" "a" "a\t" [15] "a\tb" "a\tb\tc" "a\n" "a\nb" "a\nb\nc" "a b" "a b c" [22] "a\"" "a\"b" "a\"b\"c" "a'" "a'b" "a'b'c" "|" [29] "\200" "\265" "\340" "\347" "\351" ## Ubuntu 18.04 R 4.0.3 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>"
"<U+00E7>"
[15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb"
"a\tb\tc"
[22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" # order ## Win 10 R 4.0.2 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10
7 8 2
[26] 21 14 11 12 19 16 17 31 24 ## Win 10 R devel [1] 5 9 22 13 15 32 34 26 23 18 20 33 1 6 3 4 10 7 8 2 21 14 11
12 19
[26] 16 17 31 30 28 27 29 25 24 ## Ubuntu 18.04 R 4.0.3 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10
7 8 2
[26] 21 14 11 12 19 16 17 31 24 R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C system code page: 1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 fortunes_1.5-4 R Under development (unstable) (2021-01-13 r79826) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0 R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=C LC_NUMERIC=C [3] LC_TIME=C LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=nl_BE.UTF-8 [7] LC_PAPER=nl_BE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 fortunes_1.5-4 ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
base::icuSetCollate might be what you need. There are some decent examples in the manual page on it. On Tue, Jan 19, 2021 at 7:30 AM Thierry Onkelinx via R-devel
<r-devel at r-project.org> wrote:
Dear Peter, Thanks for the feedback on the locale. Is there a better alternative for the C locale? One that yields a consistent and stable sorting independent of the R version and OS. Best regards, Thierry ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 19 jan. 2021 om 13:20 schreef Peter Dalgaard <pdalgd at gmail.com>:
Not sure what happened between 4.0.2 and -devel, but you are using C collation, which assumes 7-bit single-byte characters, to sort multi-byte 8-bit encoded characters, which looks a bit risky. -pd
On 19 Jan 2021, at 10:10 , Thierry Onkelinx via R-devel <
r-devel at r-project.org> wrote:
Dear all, My git2rdata package relies on a stable sorting. I've noticed that some characters get a different position under R-devel under Windows 10. This is why the unit test of my package only fail in this combination (
Below is a minimal example to illustrate the problem.
Best regards,
Thierry
data <- readLines("
encoding = "UTF-8", n = 15) eval(parse(text = paste(tail(data, -3), collapse = ""))) ds$a <- enc2utf8(ds$a) print(ds$a) # input Sys.setlocale(locale = "C") print(sort(ds$a)) # sorted print(order(ds$a)) # order print(sessionInfo()) # input ## Win 10 R 4.0.2 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t"
"a\nb"
[8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Win 10 R devel [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t"
"a\nb"
[8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "\200" "|" "#" "@" "$" ## Ubuntu 18.04 R 4.0.3 [1] "a" "a b" "a\tb" "a\tb\tc" "\ta" "a\t" "a\nb" [8] "a\nb\nc" "\na" "a\n" "a\"b" "a\"b\"c" "\"b" "a\"" [15] "\"b\"" "a'b" "a'b'c" "'b" "a'" "'b'" "a b c" [22] "\"NA\"" "'NA'" NA "?" "&" "?" "?" [29] "?" "?" "|" "#" "@" "$" # sorted ## Win 10 R 4.0.2 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>"
"<U+00E7>"
[15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb"
"a\tb\tc"
[22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" ## Win 10 R devel [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "@" "a" "a\t" [15] "a\tb" "a\tb\tc" "a\n" "a\nb" "a\nb\nc" "a b" "a b c" [22] "a\"" "a\"b" "a\"b\"c" "a'" "a'b" "a'b'c" "|" [29] "\200" "\265" "\340" "\347" "\351" ## Ubuntu 18.04 R 4.0.3 [1] "\ta" "\na" "\"NA\"" "\"b" "\"b\"" "#" "$" [8] "&" "'NA'" "'b" "'b'" "<U+00B5>" "<U+00E0>"
"<U+00E7>"
[15] "<U+00E9>" "<U+20AC>" "@" "a" "a\t" "a\tb"
"a\tb\tc"
[22] "a\n" "a\nb" "a\nb\nc" "a b" "a b c" "a\"" "a\"b" [29] "a\"b\"c" "a'" "a'b" "a'b'c" "|" # order ## Win 10 R 4.0.2 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10
7 8 2
[26] 21 14 11 12 19 16 17 31 24 ## Win 10 R devel [1] 5 9 22 13 15 32 34 26 23 18 20 33 1 6 3 4 10 7 8 2 21 14 11
12 19
[26] 16 17 31 30 28 27 29 25 24 ## Ubuntu 18.04 R 4.0.3 [1] 5 9 22 13 15 32 34 26 23 18 20 28 27 29 25 30 33 1 6 3 4 10
7 8 2
[26] 21 14 11 12 19 16 17 31 24 R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C system code page: 1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 fortunes_1.5-4 R Under development (unstable) (2021-01-13 r79826) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0 R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=C LC_NUMERIC=C [3] LC_TIME=C LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=nl_BE.UTF-8 [7] LC_PAPER=nl_BE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 fortunes_1.5-4 ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel