Skip to content

Puzzled by the behaviour of rbinom().

4 messages · Charles C. Berry, Rolf Turner

#
In a certain arcane context I wanted to look at the result of
applying rbinom() to two versions of the probability argument,
the versions differing in only a small number (three) of entries.

I thought that if I called set.seed() (with the same argument) prior to
each call to rbinom(), I would get results that differed only where the
probabilities differed.  Most of the time this appears to be true:

set.seed(17) # Or any other seed, in my experience.
p0 <- runif(147)
p1 <- p0
p1[c(26,65,131)] <- 0.5
set.seed(42)
r0 <- rbinom(147,1000,p0)
set.seed(42)
r1 <- rbinom(147,1000,p1)
which(r1 != r0)
[1]  26  65 131

But there is a particular probability vector for which the expected
result does not hold.  I have attached this vector in a file
p.weird.txt.  Access it via p.weird <- dget("p.weird.txt").

If I do:

p0 <- p.weird
p1 <- p0
p1[c(26,65,131)] <- 0.5
set.seed(42)
r0 <- rbinom(147,1000,p0)
set.seed(42)
r1 <- rbinom(147,1000,p1)
which(r1 != r0)

then I get

[1]  26  65  66  72  73  74  75  76 131

so the results differ (inexplicably, to me) at indices 66, 72, 73, 74,
75, and 76 as well as at the indices at which they would be
expected to differ.

There is "pattern" or "structure" in p.weird (try plot(p.weird) to
see it) but why should that matter?

I have just now discerned that if I (slightly) *round* the values of
p.weird:

    p.round <- round(p.weird,12)

then going through the rbinom() exercise with p.round gives the
expected results, i.e. r0 and r1 differ only at indices 26, 65 and 131.

Note that although p.round and p.weird are "slightly different":
they differ by less than sqrt(.Machine$double.eps) whence all.equal()
says that they are "the same".

I am completely unable to understand how there could be a mechanism by
which the behaviour of rbinom() would be affected in this way.

Can anyone provide me with enlightenment?  Ta.

cheers,

Rolf Turner
#
Rolf,
[rest deleted]

The fillip is that rbinom may use a variable number of draws from a uniform to generate the binomial.

You can read the  Kachitvichyanukul and Schmeiser article or hunt through the code to see what it does, but brute force exploration will give you a sense:

set.seed(1)
u1000 <- runif(1000)
res <-
  sapply(u1000,
         function(pr){
           set.seed(1)
           rbinom(1,1000,pr)
           u_next <- runif(1)
           j <- match(u_next, u1000)
           c( i, j-1, pr)} )

table(res[2,]) # how many draws
range(res[3, res[2,] == 2]) # range when 2 draws
plot(res[2,], res[3,])

Evidently, tails of `pr' only need one draw, but the interior needs two.

So my guess is that the changes you made shifted the RNG stream.

I guess you need to reset the seed for each trial.

---

FWIW, I tried your p.weird example, but did not get the same results as you showed. Only elements 26, 65, and 131 differed.


Note that for me:
[1] "Mersenne-Twister" "Inversion"        "Rejection"
R version 4.2.0 (2022-04-22)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0
HTH,
Chuck
#
On Sat, 30 Jul 2022 18:30:29 +0000
"Berry, Charles" <ccberry at health.ucsd.edu> wrote:
<SNIP>
The index "i" in "c(i,j-1,pr)" is undefined.  Should that be
"c(j,j-1,pr)"?

I am going to have to think a bit in order to understand what your code
is doing.  Sorry for being such a thicko.
Deepayan Sarkar reported the same phenomenon, off-list.  Curiouser and
curiouser.

I get the same result as you from RNGkind().

My sessionInfo is:

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brev_0.0-8

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3    usethis_2.0.1     devtools_2.4.2    pkgload_1.2.4    
 [5] colorspace_1.4-1  R6_2.5.1          rlang_1.0.2       fastmap_1.0.1    
 [9] tools_4.2.1       nnet_7.3-17       pkgbuild_1.2.0    hglmm_0.0-16     
[13] sessioninfo_1.1.1 cli_3.2.0         withr_2.5.0       ellipsis_0.3.2   
[17] remotes_2.4.0     rprojroot_2.0.3   lifecycle_1.0.1   crayon_1.5.1     
[21] brio_1.1.3        processx_3.5.3    purrr_0.3.4       callr_3.7.0      
[25] fs_1.5.0          ps_1.6.0          dbd_0.0-23        testthat_3.1.3   
[29] hmmRT_0.0-5       memoise_2.0.0     glue_1.6.2        cachem_1.0.5     
[33] compiler_4.2.1    desc_1.4.1        prettyunits_1.1.1

There are differences from your sessionInfo.  Perhaps most notably, I'm
running R 4.2.1 whereas you are running 4.2.0.

But that does not seem to be the problem.  When I use "R --vanilla"
and do the rbinom() experiment, I get the same result as you and
Deepayan, i.e. the expected/correct result.

So something about the more complex session environment that I get, when
I use my usual invocation of R, is messing things up.  I'll try to track
down where the mess-up comes from, but it's probably a bit beyond me.
:-(

The problem may well lie in that plethora of packages "loaded via a
namespace (and not attached)".  Under R --vanilla the corresponding
list is just "[1] compiler_4.2.1".

Thanks for your advice; it's given me a bit to chew on at least.

cheers,

Rolf

<SNIP>
#
Rolf,
Sorry, that `i' is cruft from an earlier version that used a for loop indexed by `i` 

You can just do `i <- 1L` before the sapply(...) to make it run.

Or delete it and shift the subscripts of res down by 1L.

Chuck