Skip to content

sample(c(0, 1)...) vs. rbinom

6 messages · Michael Hannon, Jeff Newmiller, Peter Dalgaard +2 more

#
Greetings.? My wife is teaching an introductory stat class at UC Davis.? The
class emphasizes the use of simulations, rather than mathematics, to get
insight into statistics, and R is the mandated tool.?? A student in the class
recently inquired about different approaches to sampling from a binomial
distribution.? I've appended some code that exhibits the idea, the gist of
which is that using sample(c(0, 1), ...) and rbinom(...) should give
equivalent results.

The surprising (to me) result is that the two approaches DO give the same
result, EXCEPT when the probability is exactly 0.5.? See Appendix A for the
code and Appendix B for the output.? I don't think this issue is
system-dependent, but I've put my session information in Appendix C.

Another wrinkle in this is that if I omit the "prob" parameter from the call
to sample, meaning to take the default value of 0.5, the two methods DO give
the same result.

Any thoughts about this?? Thanks.

--Mike

Appendix A: some R code that exhibits the problem
=================================================

ppp <- seq(0, 1, by = 0.01)

result <- do.call(rbind, lapply(ppp, function(p) {
??? set.seed(1)
??? sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
??????????????????????? prob=c(1-p, p))
?? ?
??? set.seed(1)
??? rbinomRes <- rbinom(1, size = 1, prob = p)
?? ?
??? data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
?? ?
}))

result


Appendix B: the output from the R code
======================================

??? prob equivalent
1?? 0.00?????? TRUE
2?? 0.01?????? TRUE
3?? 0.02?????? TRUE
4?? 0.03?????? TRUE
5?? 0.04?????? TRUE
6?? 0.05?????? TRUE
7?? 0.06?????? TRUE
8?? 0.07?????? TRUE
9?? 0.08?????? TRUE
10? 0.09?????? TRUE
11? 0.10?????? TRUE
12? 0.11?????? TRUE
13? 0.12?????? TRUE
14? 0.13?????? TRUE
15? 0.14?????? TRUE
16? 0.15?????? TRUE
17? 0.16?????? TRUE
18? 0.17?????? TRUE
19? 0.18?????? TRUE
20? 0.19?????? TRUE
21? 0.20?????? TRUE
22? 0.21?????? TRUE
23? 0.22?????? TRUE
24? 0.23?????? TRUE
25? 0.24?????? TRUE
26? 0.25?????? TRUE
27? 0.26?????? TRUE
28? 0.27?????? TRUE
29? 0.28?????? TRUE
30? 0.29?????? TRUE
31? 0.30?????? TRUE
32? 0.31?????? TRUE
33? 0.32?????? TRUE
34? 0.33?????? TRUE
35? 0.34?????? TRUE
36? 0.35?????? TRUE
37? 0.36?????? TRUE
38? 0.37?????? TRUE
39? 0.38?????? TRUE
40? 0.39?????? TRUE
41? 0.40?????? TRUE
42? 0.41?????? TRUE
43? 0.42?????? TRUE
44? 0.43?????? TRUE
45? 0.44?????? TRUE
46? 0.45?????? TRUE
47? 0.46?????? TRUE
48? 0.47?????? TRUE
49? 0.48?????? TRUE
50? 0.49?????? TRUE
51? 0.50????? FALSE
52? 0.51?????? TRUE
53? 0.52?????? TRUE
54? 0.53?????? TRUE
55? 0.54?????? TRUE
56? 0.55?????? TRUE
57? 0.56?????? TRUE
58? 0.57?????? TRUE
59? 0.58?????? TRUE
60? 0.59?????? TRUE
61? 0.60?????? TRUE
62? 0.61?????? TRUE
63? 0.62?????? TRUE
64? 0.63?????? TRUE
65? 0.64?????? TRUE
66? 0.65?????? TRUE
67? 0.66?????? TRUE
68? 0.67?????? TRUE
69? 0.68?????? TRUE
70? 0.69?????? TRUE
71? 0.70?????? TRUE
72? 0.71?????? TRUE
73? 0.72?????? TRUE
74? 0.73?????? TRUE
75? 0.74?????? TRUE
76? 0.75?????? TRUE
77? 0.76?????? TRUE
78? 0.77?????? TRUE
79? 0.78?????? TRUE
80? 0.79?????? TRUE
81? 0.80?????? TRUE
82? 0.81?????? TRUE
83? 0.82?????? TRUE
84? 0.83?????? TRUE
85? 0.84?????? TRUE
86? 0.85?????? TRUE
87? 0.86?????? TRUE
88? 0.87?????? TRUE
89? 0.88?????? TRUE
90? 0.89?????? TRUE
91? 0.90?????? TRUE
92? 0.91?????? TRUE
93? 0.92?????? TRUE
94? 0.93?????? TRUE
95? 0.94?????? TRUE
96? 0.95?????? TRUE
97? 0.96?????? TRUE
98? 0.97?????? TRUE
99? 0.98?????? TRUE
100 0.99?????? TRUE
101 1.00?????? TRUE

Appendix C: Session information
===============================
R version 3.0.0 (2013-04-03)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C???????????? ?
?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=en_US.UTF-8?? ?
?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8? ?
?[7] LC_PAPER=C???????????????? LC_NAME=C??????????????? ?
?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C?????????? ?
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C????? ?

attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods?? base??? ?
#
You seem to be building an elaborate structure for testing the reproducibility of the random number generator. I suspect that rbinom is calling the random number generator a different number of times when you pass prob=0.5 than otherwise.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Michael Hannon <jm_hannon at yahoo.com> wrote:

            
#
On May 23, 2013, at 07:01 , Jeff Newmiller wrote:

            
Nope. It's switching 0 and 1:
[1] 1 1 0 0 1 0 0 0 0 1
 [1] 0 0 1 1 0 1 1 1 1 0

which is curious, but of course has no implication for the distributional properties. Curiouser, if you drop the prob= in sample.
[1] 0 0 1 1 0 1 1 1 1 0
 [1] 0 0 1 1 0 1 1 1 1 0

However, it was never a design goal that two different random functions (or even two code paths within the same function) should give exactly the same values, even if they simulate the same distribution, so this is nothing more than a curiosity.

  
    
#
After a bit of playing around, I discovered that
sample() does something similar in other situations:
[1] 3
[1] 2
[1] 3 4 2 1 5
[1] 2 5 1 4 3

albyn
On 2013-05-22 22:24, peter dalgaard wrote:
#
What is the "something similar" you are referring to?  And I guess I still don't understand what it is that concerns you about the sample function.


Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
#
the "something similar" is return a different result in two
situations where one might expect the same result, ie when
a probability vector with equal probabilities is supplied versus
the default of equal probabilities.

And, assuming that by "concerns me" you mean "worries me",
I have no clue why you think it does!  It is a curiosity.

albyn
On Thu, May 23, 2013 at 04:38:18PM +0000, Nordlund, Dan (DSHS/RDA) wrote: