Bias in R's random integers?

Thu, Sep 27, 2018 12:58 PM

On 9/21/18 3:15 PM, Ralf Stubner wrote:

I just realized that the mailing list had stripped the attachment. I am
therefore including it at the end for reference. Meanwhile I have also
cast this into a package (feedback welcome):

	https://www.daqana.org/dqsample/

cheerio
ralf

// [[Rcpp::plugins(cpp11)]]
#include <cstdint>
#include <Rcpp.h>

/* Daniel Lemire
   Fast Random Integer Generation in an Interval
   https://arxiv.org/abs/1805.10941
 */
uint32_t  nearlydivisionless32(uint32_t s, uint32_t (*random32)(void)) {
    uint32_t x = random32();
    uint64_t m = (uint64_t) x * (uint64_t) s;
    uint32_t l = (uint32_t) m;
    if (l < s) {
        uint32_t t = -s % s;
        while (l < t) {
            x = random32();
            m = (uint64_t) x * (uint64_t) s;
            l = (uint32_t) m;
        }
    }
    return m >> 32;
}

uint32_t random32() {
    return R::runif(0, 1) * 4294967296; /* 2^32 */
}

// [[Rcpp::export]]
Rcpp::IntegerVector sample_int(int n, int size, bool replace = false) {
    Rcpp::IntegerVector result(Rcpp::no_init(size));
    if (replace) {
        for (int i = 0; i < size; ++i)
            result[i] = nearlydivisionless32(n, random32) + 1;
    } else {
        Rcpp::IntegerVector tmp(Rcpp::no_init(n));
        for (int i = 0; i < n; ++i)
            tmp[i] = i;
        for (int i = 0; i < size; ++i) {
            int j = nearlydivisionless32(n, random32);
            result[i] = tmp[j] + 1;
            tmp[j] = tmp[--n];
        }
    }
    return result;
}

/*** R
set.seed(42)
sample.int(6, 10, replace = TRUE)
sample.int(100, 10)
set.seed(42)
sample_int(6, 10, replace = TRUE)
sample_int(100, 10)

m <- ceiling((2/5)*2^32)

set.seed(42)
x <- sample.int(m, 1000000, replace = TRUE)
table(x %% 2)

set.seed(42)
y <- sample_int(m, 1000000, replace = TRUE)
table(y %% 2)

set.seed(42)
sample.int(m, 6, replace = TRUE)

set.seed(42)
sample_int(m, 6, replace = TRUE)

bench::mark(orig = sample.int(m, 1000000, replace = TRUE),
            new  = sample_int(m, 1000000, replace = TRUE),
            check = FALSE)
bench::mark(orig = sample.int(6, 1000000, replace = TRUE),
            new  = sample_int(6, 1000000, replace = TRUE),
            check = FALSE)
 */

Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustra?e 48
14467 Potsdam

T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner at daqana.com

Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Gesch?ftsf?hrer: Prof. Dr. Dr. Karl-Kuno Kunze


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180927/2648621a/attachment.sig>

Bias in R's random integers?

Thread (35 messages)