Skip to content
Prev 55895 / 63424 Next

Bias in R's random integers?

On 9/20/18 5:15 PM, Duncan Murdoch wrote:
Right, the RNGs in R produce no more than 32bits, so the conversion to a
double can be reverted. If we ignore those RNGs that produce less than
32bits for the moment, then the attached file contains a sample
implementation (without long vectors, weighted sampling or hashing). It
uses Rcpp for convenience, but I have tried to keep the C++ low.
Interesting results:

The results for "simple" sampling are the same.
[1] 6 6 2 5 4 4 5 1 4 5
[1] 46 72 92 25 45 90 98 11 44 51
[1] 6 6 2 5 4 4 5 1 4 5
[1] 46 72 92 25 45 90 98 11 44 51


But there is no bias with the alternative method:
0      1
467768 532232
0      1
500586 499414


The differences are also visible when sampling only a few values from
'm' possible values:
[1] 1571624817 1609883303  491583978 1426698159 1102510407  891800051
[1]  491583978 1426698159 1102510407  891800051 1265449090  231355453


When sampling from 'm', performance is not so good since we often have
to get a second random number:
+             new  = sample_int(m, 1000000, replace = TRUE),
+             check = FALSE)
# A tibble: 2 x 14
  expression     min    mean  median   max `itr/sec` mem_alloc  n_gc n_itr
  <chr>      <bch:t> <bch:t> <bch:t> <bch>     <dbl> <bch:byt> <dbl> <int>
1 orig        8.15ms  8.67ms  8.43ms  10ms     115.     3.82MB     4    52
2 new        25.21ms 25.58ms 25.45ms  27ms      39.1    3.82MB     2    18
# ... with 5 more variables: total_time <bch:tm>, result <list>, memory
<list>,
#   time <list>, gc <list>


When sampling from fewer values, the difference is much less pronounced:
+             new  = sample_int(6, 1000000, replace = TRUE),
+             check = FALSE)
# A tibble: 2 x 14
  expression     min    mean  median     max `itr/sec` mem_alloc  n_gc n_itr
  <chr>      <bch:t> <bch:t> <bch:t> <bch:t>     <dbl> <bch:byt> <dbl> <int>
1 orig        8.14ms  8.44ms  8.29ms  9.58ms     118.     3.82MB     4    54
2 new        11.13ms 11.66ms 11.23ms 12.98ms      85.8    3.82MB     3    39
# ... with 5 more variables: total_time <bch:tm>, result <list>, memory
<list>,
#   time <list>, gc <list>
Indeed. Adding/subtracting numbers < 10 to/from 'm'  gives "interesting"
curves.
I have the impression that Lemire's method gives the same results unless
it is correcting for the bias that exists in the current method. If that
is really the case, then the disruption should be rather minor. The
ability to fall back to the old behavior would still be useful, though.

cheerio
ralf

Thread (35 messages)

Carl Boettiger Bias in R's random integers? Sep 18 Duncan Murdoch Bias in R's random integers? Sep 19 Iñaki Ucar Bias in R's random integers? Sep 19 David Hugh-Jones Bias in R's random integers? Sep 19 Ben Bolker Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 David Hugh-Jones Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Ben Bolker Bias in R's random integers? Sep 19 Carl Boettiger Bias in R's random integers? Sep 19 Ralf Stubner Bias in R's random integers? Sep 20 Duncan Murdoch Bias in R's random integers? Sep 20 Paul Gilbert Bias in R's random integers? Sep 20 Gabriel Becker Bias in R's random integers? Sep 20 Hervé Pagès Bias in R's random integers? Sep 20 Steve Grubb Bias in R's random integers? Sep 20 Philip B. Stark Bias in R's random integers? Sep 20 Ralf Stubner Bias in R's random integers? Sep 21 Steve Grubb Bias in R's random integers? Sep 21 Dirk Eddelbuettel Bias in R's random integers? Sep 21 Dirk Eddelbuettel Bias in R's random integers? Sep 21 Luke Tierney Bias in R's random integers? Sep 21 Ralf Stubner Bias in R's random integers? Sep 21 Steve Grubb Bias in R's random integers? Sep 21 Steve Grubb Bias in R's random integers? Sep 21 Ralf Stubner Bias in R's random integers? Sep 27