Skip to content

vectorized approach to cumulative sampling

3 messages · Daniel E. Bunker, (Ted Harding), Rich FitzJohn

#
Hi All,

I need to sample a vector ("old"), with replacement, up to the point 
where my vector of samples ("new") sums to a predefined value 
("target"), shortening the last sample if necessary so that the total 
sum ("newsum") of the samples matches the predefined value.

While I can easily do this with a "while" loop (see below for example 
code), because the length of both "old" and "new" may be > 20,000, a 
vectorized approach will save me lots of CPU time.

Any suggestions would be greatly appreciated.

Thanks, Dan

# loop approach
old=c(1:10)
p=runif(1:10)
target=20

newsum=0
new=NULL
while (newsum<target) {
    i=sample(old, size=1, prob=p);
    new[length(new)+1]=i;
    newsum=sum(new)
    }
new
newsum
target
if(newsum>target){new[length(new)]=target-sum(new[-length(new)])}
new
newsum=sum(new); newsum
target
#
On 07-Apr-05 Daniel E. Bunker wrote:
Hi Dan,
You should be able to adapt the following vectorised approach
to your specific needs:

  old<-0.001*(1:1000)
  new<-sample(old,10000,replace=TRUE,prob=p)
  target<-200
  min(which(cumsum(new)>target))

## [1] 385

This took only a fraction of a second on my medium-speed machine.
If you get an "Inf" as result, then 'new' doesn't add up to
'target', so you have to extend it.

Hoping this helps,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Apr-05                                       Time: 22:46:12
------------------------------ XFMail ------------------------------
#
Hi,

sample() takes a "replace" argument, so you can take large samples,
with replacement, like this: (In the sample() call, the
50*target/mean(old) should make it sample 50 times more than likely.
This means the while loop will probably get executed only once.  This
could be tuned easily, and there may be better ways of guessing how
much to take).

old <- c(1:2000)
p <- runif(1:2000)
target <- 4000
new <- 0

while ( sum(new) < target )
  new <- sample(old, 50*target/mean(old), TRUE, p)

i <- which(cumsum(new) >= target)[1]
new <- new[1:i]
new[i] <- new[i] - (sum(new)-target)

Cheers,
Rich
On Apr 8, 2005 9:19 AM, Daniel E. Bunker <deb37 at columbia.edu> wrote: