Error: cannot take a sample larger than the population
Aldi, Your concept of sample is different from mine. I would expect with replacement to be equivalent for a for loop of sampling without replacement. samples <- 1:400 for (i in 1:400) samples[i] <- sample(c(0,1,2),1 ,prob=c(0.02 ,0.93 ,0.05 )) Sampling without replacement: first : sample(c(0,1,2),1 ,prob=c(0.02 ,0.93 ,0.05 )) second: depending on first (suppose 2 was selected) sample(c(0,1),1 ,prob=c(0.02 ,0.93)/.95) third: whatever is remaining with probability 1. n.b. the second is equivalent to sample(c(0,1),1 ,prob=c(0.02 ,0.93)), since sample normalized the probabilities itself. Concerning your result: observed <- c(0.0200, 0.9225, 0.0575 )*400 expected <- c(0.02 ,0.93 ,0.05 )*400 stat <- sum((observed-expected)^2/expected) pchisq(stat,2,lower=FALSE) [1] 0.788915 Seems ok to me. Cheers, Kees
On Saturday 30 December 2006 16:55, Aldi Kraja wrote:
Partial Summary and discussion:
=====================
Thank you to Chao Gai, Chuck Cleland, and Jim Lemon for their suggestion
to use replace=T in R.
There is a problem though (see below)
In the Splus7, sample is defined as
-------------
sample(x, size = n, replace = F, prob = NULL, n = NULL, ...) where
replace=F
In Splus7
xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
and the
table(xlrmN1)/400
0 1 2
0.02 0.93 0.05
show that "sample" is working exactly as expected based on the prob vector.
When "sample" is used in Splus7 with replacement we see the following
result:
> xlrmN1 <- sample(c(0,1,2),400 ,replace=T,prob=c(0.02 ,0.93 ,0.05 )) > table(xlrmN1)/400
0 1 2
0.0125 0.925 0.0625
which I think is working again as expected.
In the R, sample is defined as
---------
sample(x, size, replace = FALSE, prob = NULL)
So the above statement with replace=F did not work (reported error)
but with replace=T produced,
table(xlrmN1)/400
xlrmN1
0 1 2
0.0200 0.9225 0.0575
which is not exactly the sample with the probabilities provided
(0.02,0.93,0.05)
Now let's return to the concept of replace=F and replace=T.
When I ask "sample" to select a sample of 400 from a vector of 3 with NO
replacement, I would think the following a). create a very large sample
from 0, 1, and 2. b). From this large sample, based on the prob vector
select without replacement. c). As result I expect the probability of
selected sample to be exactly the same with the prob vector (As in Splus7)
When I ask "sample" to select a sample of 400 from a vector of 3 with
replacement, I would think the following a). create a very large sample
from 0, 1, and 2. b). From this large sample, based on the prob vector
select with replacement, which means some of the previous selected 0, 1, 2
can be selected again. c). As result I expect the probability of selected
sample to be NOT exactly the same with the prob vector (As in Splus7 and
R).
So there are two conclusions: "sample" in R is not working correct, OR I am
missing some precision as a rounding error to produce
prob=c(0.02 ,0.93 ,0.05 ).
Am I misunderstanding the "sample" function in R?
Any suggestions are appreciated.
TIA,
Aldi
Aldi Kraja wrote:
Hi, In Splus7 this statement xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 )) worked fine, but in R the interpreter reports that the length of the vector to chose c(0,1,2) is shorter than the size of many times I want to be selected from the vector c(0,1,2). Any good reason? See below the error.
xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
Error in sample(length(x), size, replace, prob) :
cannot take a sample larger than the population
when 'replace = FALSE'
Execution halted
TIA,
Aldi
--
--
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.