vectorized approach to cumulative sampling
Hi, sample() takes a "replace" argument, so you can take large samples, with replacement, like this: (In the sample() call, the 50*target/mean(old) should make it sample 50 times more than likely. This means the while loop will probably get executed only once. This could be tuned easily, and there may be better ways of guessing how much to take). old <- c(1:2000) p <- runif(1:2000) target <- 4000 new <- 0 while ( sum(new) < target ) new <- sample(old, 50*target/mean(old), TRUE, p) i <- which(cumsum(new) >= target)[1] new <- new[1:i] new[i] <- new[i] - (sum(new)-target) Cheers, Rich
On Apr 8, 2005 9:19 AM, Daniel E. Bunker <deb37 at columbia.edu> wrote:
Hi All,
I need to sample a vector ("old"), with replacement, up to the point
where my vector of samples ("new") sums to a predefined value
("target"), shortening the last sample if necessary so that the total
sum ("newsum") of the samples matches the predefined value.
While I can easily do this with a "while" loop (see below for example
code), because the length of both "old" and "new" may be > 20,000, a
vectorized approach will save me lots of CPU time.
Any suggestions would be greatly appreciated.
Thanks, Dan
# loop approach
old=c(1:10)
p=runif(1:10)
target=20
newsum=0
new=NULL
while (newsum<target) {
i=sample(old, size=1, prob=p);
new[length(new)+1]=i;
newsum=sum(new)
}
new
newsum
target
if(newsum>target){new[length(new)]=target-sum(new[-length(new)])}
new
newsum=sum(new); newsum
target
Rich FitzJohn rich.fitzjohn <at> gmail.com | http://homepages.paradise.net.nz/richa183 You are in a maze of twisty little functions, all alike