Bias in R's random integers?
On 20/09/2018 11:01 AM, Radford Neal wrote:
From: Duncan Murdoch <murdoch.duncan at gmail.com>
Let's try it:
> m <- (2/5)*2^32 > m > 2^31
[1] FALSE
> x <- sample(m, 1000000, replace = TRUE) > table(x %% 2)
0 1
399850 600150
Since m is an even number, the true proportions of evens and odds should
be exactly 0.5. That's some pretty strong evidence of the bug in the
generator.
It seems to be a recently-introduced bug. Here's output with R-2.15.1: R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
> set.seed(123) > m <- (2/5)*2^32 > m > 2^31
[1] FALSE
> x <- sample(m, 1000000, replace = TRUE) > table(x %% 2)
0 1
499412 500588
So I doubt that this has anything to do with bias from using 32-bit
random values.
There are two bugs, one recent one (allowing fractional m to slip
through) and one old one (the bias). The test above shows the bias with
m+1, i.e. in R 2.15.1
> x <- sample(m + 1, 1000000, replace = TRUE)
> table(x %% 2)
0 1
466739 533261
and you can see it in the original m with
x <- sample(m, 1000000, replace = TRUE)
plot(density(x[x %% 2 == 0]))
Duncan Murdoch