Skip to content
Prev 55875 / 63421 Next

A different error in sample()

Yup, that is a bug, at least in the documentation. Probably a clearer example is 

x <- seq(2.001, 2.999, length.out=999)
threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])
plot(threes, type="l")
curve(10000*(x-2)/x, add=TRUE, col="red")

which is entirely consistent with what you'd expect from floor(runif(10000, 0, y)) + 1, and as far as I can tell from the source, that is what is happening internally. 

(Strict monotonicity is a bit of a red herring, it is jut a matter of having spaced the y so far apart that the probability of an order reversal becomes negligible.)

So either we should do what the documentation says we do, or the documentation should not say that we do what we do not actually do...

The suspect code is this snippet from do_sample:

            int n = (int) dn;
            .....

            if (replace || k < 2) {
                for (int i = 0; i < k; i++) iy[i] = (int)(R_unif_index(dn) + 1);
            } else {
                int *x = (int *)R_alloc(n, sizeof(int));
                for (int i = 0; i < n; i++) x[i] = i;
                for (int i = 0; i < k; i++) {
                    int j = (int)(R_unif_index(n));
                    iy[i] = x[j] + 1;
                    x[j] = x[--n];
                }
            }

(notice arguments to R_unif_index)

-pd