Bias in R's random integers?
On 9/21/18 3:15 PM, Ralf Stubner wrote:
Right, the RNGs in R produce no more than 32bits, so the conversion to a double can be reverted. If we ignore those RNGs that produce less than 32bits for the moment, then the attached file contains a sample implementation (without long vectors, weighted sampling or hashing). It uses Rcpp for convenience, but I have tried to keep the C++ low.
I just realized that the mailing list had stripped the attachment. I am therefore including it at the end for reference. Meanwhile I have also cast this into a package (feedback welcome): https://www.daqana.org/dqsample/ cheerio ralf // [[Rcpp::plugins(cpp11)]] #include <cstdint> #include <Rcpp.h> /* Daniel Lemire Fast Random Integer Generation in an Interval https://arxiv.org/abs/1805.10941 */ uint32_t nearlydivisionless32(uint32_t s, uint32_t (*random32)(void)) { uint32_t x = random32(); uint64_t m = (uint64_t) x * (uint64_t) s; uint32_t l = (uint32_t) m; if (l < s) { uint32_t t = -s % s; while (l < t) { x = random32(); m = (uint64_t) x * (uint64_t) s; l = (uint32_t) m; } } return m >> 32; } uint32_t random32() { return R::runif(0, 1) * 4294967296; /* 2^32 */ } // [[Rcpp::export]] Rcpp::IntegerVector sample_int(int n, int size, bool replace = false) { Rcpp::IntegerVector result(Rcpp::no_init(size)); if (replace) { for (int i = 0; i < size; ++i) result[i] = nearlydivisionless32(n, random32) + 1; } else { Rcpp::IntegerVector tmp(Rcpp::no_init(n)); for (int i = 0; i < n; ++i) tmp[i] = i; for (int i = 0; i < size; ++i) { int j = nearlydivisionless32(n, random32); result[i] = tmp[j] + 1; tmp[j] = tmp[--n]; } } return result; } /*** R set.seed(42) sample.int(6, 10, replace = TRUE) sample.int(100, 10) set.seed(42) sample_int(6, 10, replace = TRUE) sample_int(100, 10) m <- ceiling((2/5)*2^32) set.seed(42) x <- sample.int(m, 1000000, replace = TRUE) table(x %% 2) set.seed(42) y <- sample_int(m, 1000000, replace = TRUE) table(y %% 2) set.seed(42) sample.int(m, 6, replace = TRUE) set.seed(42) sample_int(m, 6, replace = TRUE) bench::mark(orig = sample.int(m, 1000000, replace = TRUE), new = sample_int(m, 1000000, replace = TRUE), check = FALSE) bench::mark(orig = sample.int(6, 1000000, replace = TRUE), new = sample_int(6, 1000000, replace = TRUE), check = FALSE) */
Ralf Stubner Senior Software Engineer / Trainer daqana GmbH Dortustra?e 48 14467 Potsdam T: +49 331 23 61 93 11 F: +49 331 23 61 93 90 M: +49 162 20 91 196 Mail: ralf.stubner at daqana.com Sitz: Potsdam Register: AG Potsdam HRB 27966 P Ust.-IdNr.: DE300072622 Gesch?ftsf?hrer: Prof. Dr. Dr. Karl-Kuno Kunze -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180927/2648621a/attachment.sig>