This may be a doc error or a coding bug.
The help page for sample says:
"Non-integer positive numerical values of n or x will be truncated to
the next smallest integer, which has to be no larger than
.Machine$integer.max."
This is not true:
> table(sample(2.5, 1000000, replace = TRUE))
1 2 3
399933 399716 200351
We shouldn't have those 3's if truncation of x had occurred.
Duncan Murdoch
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1
A different error in sample()
8 messages · Duncan Murdoch, Joris Meys, Dario Strbenac +3 more
I believe the word "truncated" is causing the confusion. 3 is "the next smallest integer" following 2.5. But it is not the truncation done by trunc(). Rewording to "rounding the next smallest integer" would get rid of that confusion imho. Cheers Joris On Wed, Sep 19, 2018 at 7:57 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
This may be a doc error or a coding bug. The help page for sample says: "Non-integer positive numerical values of n or x will be truncated to the next smallest integer, which has to be no larger than .Machine$integer.max." This is not true:
> table(sample(2.5, 1000000, replace = TRUE))
1 2 3
399933 399716 200351
We shouldn't have those 3's if truncation of x had occurred.
Duncan Murdoch
> sessionInfo()
R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.6 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.1 tools_3.5.1
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Good day, The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
round(2.5)
[1] 2 -------------------------------------- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia
Besides wording of the documentation re truncating vs rounding, there
is something peculiar going on with the fractional part of n:
> table(sample.int(2.5, 1e6, replace = TRUE))
1 2 3
399051 401035 199914
> table(sample.int(3, 1e6, replace = TRUE))
1 2 3
332956 332561 334483
> table(sample.int(2.01, 1e6, replace = TRUE))
1 2 3
497173 497866 4961
> sessionInfo()
R Under development (unstable) (2018-09-17 r75319)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Users/whuber/R/lib/libRblas.dylib
LAPACK: /Users/whuber/R/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fortunes_1.5-4
loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0
20.9.18 03:00, Dario Strbenac scripsit:
Good day, The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
round(2.5)
[1] 2 -------------------------------------- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia
With thanks in advance- Wolfgang ------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany wolfgang.huber at embl.de http://www.huber.embl.de My book with Susan Holmes: http://www.huber.embl.de/msmb
FWIW, I suspect this is related to the function R_unif_index that was introduced in src/main/RNG.c around revision 72356, or the way this function is used in do_sample in src/main/random.c. 20.9.18 08:19, Wolfgang Huber scripsit:
Besides wording of the documentation re truncating vs rounding, there is something peculiar going on with the fractional part of n:
> table(sample.int(2.5, 1e6, replace = TRUE))
???? 1????? 2????? 3 399051 401035 199914
> table(sample.int(3, 1e6, replace = TRUE))
???? 1????? 2????? 3 332956 332561 334483
> table(sample.int(2.01, 1e6, replace = TRUE))
???? 1????? 2????? 3 497173 497866?? 4961
> sessionInfo()
R Under development (unstable) (2018-09-17 r75319) Platform: x86_64-apple-darwin17.7.0 (64-bit) Running under: macOS High Sierra 10.13.6 Matrix products: default BLAS: /Users/whuber/R/lib/libRblas.dylib LAPACK: /Users/whuber/R/lib/libRlapack.dylib locale: [1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods?? base other attached packages: [1] fortunes_1.5-4 loaded via a namespace (and not attached): [1] compiler_3.6.0 tools_3.6.0 20.9.18 03:00, Dario Strbenac scripsit:
Good day, The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
round(2.5)
[1] 2 -------------------------------------- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia
With thanks in advance- Wolfgang ------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany wolfgang.huber at embl.de http://www.huber.embl.de My book with Susan Holmes: http://www.huber.embl.de/msmb
Wolfgang Huber
on Thu, 20 Sep 2018 08:47:47 +0200 writes:
> FWIW, I suspect this is related to the function
> R_unif_index that was introduced in src/main/RNG.c around
> revision 72356, or the way this function is used in
> do_sample in src/main/random.c.
Yes, it is just the use of 'dn' instead of 'n'
- a one letter thinko I'd say.
But *no*, it's much older than revision 72356; e.g., it's already in
R version 3.0.0 (2013-04-03) -- "Masked Marvel"
but not yet in
R version 2.15.3 (2013-03-01) -- "Security Blanket"
----
Here, I clearly think we see a regression bug, and hopefully not
one that should trigger often in practice...
and -- without any statistics about the consequences out in
package space --
I do think we should fix this in code and let the documentation
become "great again" ;-)
Martin
> 20.9.18 08:19, Wolfgang Huber scripsit:
>> Besides wording of the documentation re truncating vs
>> rounding, there is something peculiar going on with the
>> fractional part of n:
>>
>> > table(sample.int(2.5, 1e6, replace = TRUE))
>>
>> ???? 1????? 2????? 3 399051 401035 199914
>>
>> > table(sample.int(3, 1e6, replace = TRUE))
>>
>> ???? 1????? 2????? 3 332956 332561 334483
>>
>> > table(sample.int(2.01, 1e6, replace = TRUE))
>>
>> ???? 1????? 2????? 3 497173 497866?? 4961
>>
Hi, I have not checked the source code, but I think it is because of banker's round. https://en.wikipedia.org/wiki/Rounding#Round_half_to_even Best regards, Kim -----Original Message----- From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Dario Strbenac Sent: den 20 september 2018 03:00 To: r-devel <r-devel at r-project.org> Subject: Re: [Rd] A different error in sample() Good day, The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
round(2.5)
[1] 2 -------------------------------------- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
on Thu, 20 Sep 2018 09:20:46 +0200 writes:
Wolfgang Huber
on Thu, 20 Sep 2018 08:47:47 +0200 writes:
>> FWIW, I suspect this is related to the function
>> R_unif_index that was introduced in src/main/RNG.c around
>> revision 72356, or the way this function is used in
>> do_sample in src/main/random.c.
> Yes, it is just the use of 'dn' instead of 'n'
> - a one letter thinko I'd say.
> But *no*, it's much older than revision 72356; e.g., it's already in
> R version 3.0.0 (2013-04-03) -- "Masked Marvel"
> but not yet in
> R version 2.15.3 (2013-03-01) -- "Security Blanket"
> ----
> Here, I clearly think we see a regression bug, and hopefully not
> one that should trigger often in practice...
> and -- without any statistics about the consequences out in
> package space --
> I do think we should fix this in code and let the documentation
> become "great again" ;-)
We have agreed that this is simply a regression and should be
fixed without a change to the documenation.
Consequently, ~ 5 minutes ago
$ svn log -v -c75338
------------------------------------------------------------------------
r75338 | maechler | 2018-09-20 17:38:46 +0200 (Thu, 20 Sep 2018) | 1 line
Changed paths:
M /trunk/doc/NEWS.Rd
M /trunk/src/main/random.c
M /trunk/tests/reg-tests-1d.R
revert sample.int(<non-integer>, k, replace=TRUE) to sane pre_R-3.0.0 behaviour
------------------------------------------------------------------------
This is now back to "correct" behaviour in "R-devel (>= 75338)"
(and, as Duncan Murdoch also said by choosing this thread's
Subject, this is really a different issue than the "Bias in R's....")
Martin