Bias in R's random integers?
On 9/21/18 6:38 PM, Tierney, Luke wrote:
Not sure what should happen theoretically for the code in vseq.c, but
I see the same pattern with the R generators I tried (default,
Super-Duper, and L'Ecuyer) and with with bash $RANDOM using
N <- 10000
X1 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
X2 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
X <- X1 + 2 ^ 15 * (X2 > 2^14)
and with numbers from random.org
library(random)
X <- randomNumbers(N, 0, 2^16-1, col = 1)
So I'm not convinced there is an issue.
There is an issue, but it is in vseq.c. The plot I found striking was this: http://people.redhat.com/sgrubb/files/r-random.jpg It shows a scatter plot that is bounded to some rectangle where the upper right and lower left corner are empty. Roughly speaking, X and Y correspond to *consecutive differences* between random draws. It is obvious that differences between random draws are bounded by the range of the RNG, and that there cannot be two *differences in a row* that are close to the maximum (or minimum). Hence the expected shape for such a scatter plot is a rectangle with two corners being forbidden. Within the allowed region, there should be no structure what so ever (given enough draws). And that was striking about the above picture: It showed clear vertical bands which should not be there. MT does fail some statistical tests, but it cannot be brought down that easily. Interestingly, I first used Dirk's C++ function for convenience, and that did *not* show these bands. But when I compiled vseq.c I could reproduce this. To cut this short: There is an error in vseq.c when the numbers are read in: tmp = strtoul(buf, NULL, 16); The third argument to strtoul is the base in which the numbers should be interpreted. However, R has written numbers with base 10. Those can be interpreted as base 16, but they will mean something different. Once one changes the above line to tmp = strtoul(buf, NULL, 10); the bands do disappear. cheerio ralf
Ralf Stubner Senior Software Engineer / Trainer daqana GmbH Dortustra?e 48 14467 Potsdam T: +49 331 23 61 93 11 F: +49 331 23 61 93 90 M: +49 162 20 91 196 Mail: ralf.stubner at daqana.com Sitz: Potsdam Register: AG Potsdam HRB 27966 P Ust.-IdNr.: DE300072622 Gesch?ftsf?hrer: Prof. Dr. Dr. Karl-Kuno Kunze -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20180921/b4bc371f/attachment.sig>