Skip to content
Prev 55923 / 63421 Next

Bias in R's random integers?

On 9/21/18 3:15 PM, Ralf Stubner wrote:
I just realized that the mailing list had stripped the attachment. I am
therefore including it at the end for reference. Meanwhile I have also
cast this into a package (feedback welcome):

	https://www.daqana.org/dqsample/

cheerio
ralf

// [[Rcpp::plugins(cpp11)]]
#include <cstdint>
#include <Rcpp.h>

/* Daniel Lemire
   Fast Random Integer Generation in an Interval
   https://arxiv.org/abs/1805.10941
 */
uint32_t  nearlydivisionless32(uint32_t s, uint32_t (*random32)(void)) {
    uint32_t x = random32();
    uint64_t m = (uint64_t) x * (uint64_t) s;
    uint32_t l = (uint32_t) m;
    if (l < s) {
        uint32_t t = -s % s;
        while (l < t) {
            x = random32();
            m = (uint64_t) x * (uint64_t) s;
            l = (uint32_t) m;
        }
    }
    return m >> 32;
}

uint32_t random32() {
    return R::runif(0, 1) * 4294967296; /* 2^32 */
}

// [[Rcpp::export]]
Rcpp::IntegerVector sample_int(int n, int size, bool replace = false) {
    Rcpp::IntegerVector result(Rcpp::no_init(size));
    if (replace) {
        for (int i = 0; i < size; ++i)
            result[i] = nearlydivisionless32(n, random32) + 1;
    } else {
        Rcpp::IntegerVector tmp(Rcpp::no_init(n));
        for (int i = 0; i < n; ++i)
            tmp[i] = i;
        for (int i = 0; i < size; ++i) {
            int j = nearlydivisionless32(n, random32);
            result[i] = tmp[j] + 1;
            tmp[j] = tmp[--n];
        }
    }
    return result;
}

/*** R
set.seed(42)
sample.int(6, 10, replace = TRUE)
sample.int(100, 10)
set.seed(42)
sample_int(6, 10, replace = TRUE)
sample_int(100, 10)

m <- ceiling((2/5)*2^32)

set.seed(42)
x <- sample.int(m, 1000000, replace = TRUE)
table(x %% 2)

set.seed(42)
y <- sample_int(m, 1000000, replace = TRUE)
table(y %% 2)

set.seed(42)
sample.int(m, 6, replace = TRUE)

set.seed(42)
sample_int(m, 6, replace = TRUE)

bench::mark(orig = sample.int(m, 1000000, replace = TRUE),
            new  = sample_int(m, 1000000, replace = TRUE),
            check = FALSE)
bench::mark(orig = sample.int(6, 1000000, replace = TRUE),
            new  = sample_int(6, 1000000, replace = TRUE),
            check = FALSE)
 */

Thread (35 messages)

Carl Boettiger Bias in R's random integers? Sep 18 Duncan Murdoch Bias in R's random integers? Sep 19 Iñaki Ucar Bias in R's random integers? Sep 19 David Hugh-Jones Bias in R's random integers? Sep 19 Ben Bolker Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Philip B. Stark Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 David Hugh-Jones Bias in R's random integers? Sep 19 Duncan Murdoch Bias in R's random integers? Sep 19 Ben Bolker Bias in R's random integers? Sep 19 Carl Boettiger Bias in R's random integers? Sep 19 Ralf Stubner Bias in R's random integers? Sep 20 Duncan Murdoch Bias in R's random integers? Sep 20 Paul Gilbert Bias in R's random integers? Sep 20 Gabriel Becker Bias in R's random integers? Sep 20 Hervé Pagès Bias in R's random integers? Sep 20 Steve Grubb Bias in R's random integers? Sep 20 Philip B. Stark Bias in R's random integers? Sep 20 Ralf Stubner Bias in R's random integers? Sep 21 Steve Grubb Bias in R's random integers? Sep 21 Dirk Eddelbuettel Bias in R's random integers? Sep 21 Dirk Eddelbuettel Bias in R's random integers? Sep 21 Luke Tierney Bias in R's random integers? Sep 21 Ralf Stubner Bias in R's random integers? Sep 21 Steve Grubb Bias in R's random integers? Sep 21 Steve Grubb Bias in R's random integers? Sep 21 Ralf Stubner Bias in R's random integers? Sep 27