Skip to content

A different error in sample()

8 messages · Duncan Murdoch, Joris Meys, Dario Strbenac +3 more

#
This may be a doc error or a coding bug.

The help page for sample says:

"Non-integer positive numerical values of n or x will be truncated to 
the next smallest integer, which has to be no larger than 
.Machine$integer.max."

This is not true:

 > table(sample(2.5, 1000000, replace = TRUE))

      1      2      3
399933 399716 200351

We shouldn't have those 3's if truncation of x had occurred.

Duncan Murdoch

 > sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: 
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1
#
I believe the word "truncated" is causing the confusion. 3 is "the next
smallest integer" following 2.5. But it is not the truncation done by
trunc(). Rewording to "rounding the next smallest integer" would get rid of
that confusion imho.

Cheers
Joris

On Wed, Sep 19, 2018 at 7:57 PM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

  
    
#
Good day,

The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
[1] 2

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
#
Besides wording of the documentation re truncating vs rounding, there 
is something peculiar going on with the fractional part of n:

 > table(sample.int(2.5, 1e6, replace = TRUE))

      1      2      3
399051 401035 199914

 > table(sample.int(3, 1e6, replace = TRUE))

      1      2      3
332956 332561 334483

 > table(sample.int(2.01, 1e6, replace = TRUE))

      1      2      3
497173 497866   4961

 > sessionInfo()
R Under development (unstable) (2018-09-17 r75319)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Users/whuber/R/lib/libRblas.dylib
LAPACK: /Users/whuber/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] fortunes_1.5-4

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0


20.9.18 03:00, Dario Strbenac scripsit:

  
    
#
FWIW, I suspect this is related to the function R_unif_index that was 
introduced in src/main/RNG.c around revision 72356, or the way this 
function is used in do_sample in src/main/random.c.

20.9.18 08:19, Wolfgang Huber scripsit:

  
    
#
> FWIW, I suspect this is related to the function
    > R_unif_index that was introduced in src/main/RNG.c around
    > revision 72356, or the way this function is used in
    > do_sample in src/main/random.c.

Yes, it is just the use of 'dn' instead of 'n'
- a one letter thinko I'd say.

But *no*, it's much older than revision 72356; e.g., it's already in

    R version 3.0.0 (2013-04-03) -- "Masked Marvel"

but not yet in

    R version 2.15.3 (2013-03-01) -- "Security Blanket"

----

Here, I clearly think we see a regression bug, and hopefully not
one that should trigger often in practice...
and  -- without any statistics about the consequences out in
package space --
I do think we should fix this in code and let the documentation
become "great again" ;-)

Martin





    > 20.9.18 08:19, Wolfgang Huber scripsit:
    >> Besides wording of the documentation re truncating vs
    >> rounding, there is something peculiar going on with the
    >> fractional part of n:
    >> 
    >> > table(sample.int(2.5, 1e6, replace = TRUE))
    >> 
    >> ???? 1????? 2????? 3 399051 401035 199914
    >> 
    >> > table(sample.int(3, 1e6, replace = TRUE))
    >> 
    >> ???? 1????? 2????? 3 332956 332561 334483
    >> 
    >> > table(sample.int(2.01, 1e6, replace = TRUE))
    >> 
    >> ???? 1????? 2????? 3 497173 497866?? 4961
    >>
#
Hi,

I have not checked the source code, but I think it is because of banker's round.

https://en.wikipedia.org/wiki/Rounding#Round_half_to_even

Best regards,
Kim

-----Original Message-----
From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Dario Strbenac
Sent: den 20 september 2018 03:00
To: r-devel <r-devel at r-project.org>
Subject: Re: [Rd] A different error in sample()

Good day,

The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
[1] 2

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
>> FWIW, I suspect this is related to the function
    >> R_unif_index that was introduced in src/main/RNG.c around
    >> revision 72356, or the way this function is used in
    >> do_sample in src/main/random.c.

    > Yes, it is just the use of 'dn' instead of 'n'
    > - a one letter thinko I'd say.

    > But *no*, it's much older than revision 72356; e.g., it's already in

    > R version 3.0.0 (2013-04-03) -- "Masked Marvel"

    > but not yet in

    > R version 2.15.3 (2013-03-01) -- "Security Blanket"

    > ----

    > Here, I clearly think we see a regression bug, and hopefully not
    > one that should trigger often in practice...
    > and  -- without any statistics about the consequences out in
    > package space --
    > I do think we should fix this in code and let the documentation
    > become "great again" ;-)

We have agreed that this is simply a regression and should be
fixed without a change to the documenation.

Consequently, ~ 5 minutes ago

$ svn log -v -c75338
------------------------------------------------------------------------
r75338 | maechler | 2018-09-20 17:38:46 +0200 (Thu, 20 Sep 2018) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/main/random.c
   M /trunk/tests/reg-tests-1d.R

revert sample.int(<non-integer>, k, replace=TRUE) to sane pre_R-3.0.0 behaviour
------------------------------------------------------------------------

This is now back to "correct" behaviour  in  "R-devel (>= 75338)"

(and, as Duncan Murdoch also said by choosing this thread's
 Subject, this is really a different issue than the  "Bias in R's....")

Martin