Bias in R's random integers?
On 2018-09-19 09:40 AM, David Hugh-Jones wrote:
On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
I think the analyses are correct, but I doubt if a change to the default is likely to be accepted as it would make it more difficult to reproduce older results.
I'm a bit alarmed by the logic here. Unbiased sampling seems basic for a statistical language. As a consumer of R I'd like to think that e.g. my bootstrapped p values are correct. Surely if the old results depend on the biased algorithm, then they are false results?
Balancing backward compatibility and correctness is a tough problem here. If this goes into base R, what's the best way to do it? What was the protocol for migrating away from the "buggy Kinderman-Ramage" generator, back in the day? (Version 1.7 was sometime between 2001 and 2004). I couldn't find the exact commit in the GitHub mirror: this is related ... https://github.com/wch/r-source/commit/7ad3044639fd1fe093c655e573fd1a67aa7f55f6#diff-dbcad570d4fb9b7005550ff630543b37 === ?normal.kind? can be ?"Kinderman-Ramage"?, ?"Buggy Kinderman-Ramage"? (not for ?set.seed?), ?"Ahrens-Dieter"?, ?"Box-Muller"?, ?"Inversion"? (the default), or ?"user-supplied"?. (For inversion, see the reference in ?qnorm?.) The Kinderman-Ramage generator used in versions prior to 1.7.0 (now called ?"Buggy"?) had several approximation errors and should only be used for reproduction of old results.