Skip to content

Error: cannot take a sample larger than the population

5 messages · Chuck Cleland, Aldi Kraja, chao gai

#
Hi,
In Splus7 this statement
xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
worked fine, but in R the interpreter reports that the length of the 
vector to chose c(0,1,2) is shorter than the size of many times I want 
to be selected from the vector c(0,1,2).
Any good reason?
See below the error.

 > xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
Error in sample(length(x), size, replace, prob) :
        cannot take a sample larger than the population
 when 'replace = FALSE'
Execution halted

TIA,

Aldi

--
#
Aldi Kraja wrote:
So why not use replace = TRUE ?

xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ), replace=TRUE)

table(xlrmN1)
xlrmN1
  0   1   2
  5 373  22

prop.table(table(xlrmN1))
xlrmN1
     0      1      2
0.0125 0.9325 0.0550

  
    
#
Partial Summary and discussion:
=====================
Thank you to Chao Gai, Chuck Cleland, and Jim Lemon for their suggestion 
to use replace=T in R.
There is a problem though (see below)

In the Splus7, sample is defined as
-------------
sample(x, size = n, replace = F, prob = NULL, n = NULL, ...)  where 
replace=F
In Splus7

xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))

and the 

table(xlrmN1)/400
    0    1    2
 0.02 0.93 0.05
show that "sample" is working exactly as expected based on the prob vector.

When "sample" is used in Splus7 with replacement we see the following 
result:
 > xlrmN1 <- sample(c(0,1,2),400 ,replace=T,prob=c(0.02 ,0.93 ,0.05 ))
 > table(xlrmN1)/400
      0     1      2
 0.0125 0.925 0.0625
which I think is working again as expected.

In the R, sample is defined as
---------

sample(x, size, replace = FALSE, prob = NULL)

So the above statement with replace=F did not work (reported error)
but with replace=T produced,
xlrmN1
     0      1      2 
0.0200 0.9225 0.0575 

which is not exactly the sample with the probabilities provided (0.02,0.93,0.05)

Now let's return to the concept of replace=F and replace=T.
When I ask "sample" to select a sample of 400 from a vector of 3 with NO replacement, I would think the following
a). create a very large sample from 0, 1, and 2. b). From this large sample, based on the prob vector select without replacement.
c). As result I expect the probability of selected sample to be exactly the same with the prob vector (As in Splus7)

When I ask "sample" to select a sample of 400 from a vector of 3 with replacement, I would think the following
a). create a very large sample from 0, 1, and 2. b). From this large sample, based on the prob vector select with replacement, 
which means some of the previous selected 0, 1, 2 can be selected again.
c). As result I expect the probability of selected sample to be NOT exactly the same with the prob vector (As in Splus7 and R).

So there are two conclusions: "sample" in R is not working correct, OR I am missing some precision as a rounding error to produce

prob=c(0.02 ,0.93 ,0.05 ).
Am I misunderstanding the "sample" function in R?

Any suggestions are appreciated.
TIA,

Aldi
Aldi Kraja wrote:

            
--
#
Aldi,

Your concept of sample is different from mine. 
I would expect with replacement to be equivalent for a for loop of sampling 
without replacement.
samples <- 1:400
for (i in 1:400) samples[i] <- sample(c(0,1,2),1 ,prob=c(0.02 ,0.93 ,0.05 ))
Sampling without replacement:
first :  sample(c(0,1,2),1 ,prob=c(0.02 ,0.93 ,0.05 ))
second: depending on first (suppose 2 was selected)
	 sample(c(0,1),1 ,prob=c(0.02 ,0.93)/.95)
third: whatever is remaining with probability 1.

n.b. the second is equivalent to  sample(c(0,1),1 ,prob=c(0.02 ,0.93)), since 
sample normalized the probabilities itself.

Concerning your result:
observed <- c(0.0200, 0.9225, 0.0575 )*400
expected  <- c(0.02 ,0.93 ,0.05 )*400
stat <- sum((observed-expected)^2/expected)
pchisq(stat,2,lower=FALSE)
[1] 0.788915

Seems ok to me.

Cheers,
Kees
On Saturday 30 December 2006 16:55, Aldi Kraja wrote:
#
Aldi Kraja wrote:
Yes, I think you are misunderstanding sample() in R.  If you want
those exact proportions in your xlrmN1 but with the observations in a
random order, you could do this:
xlrmN1
   0    1    2
0.02 0.93 0.05