vectorized approach to cumulative sampling - R-help

Thu, Apr 7, 2005 2:19 PM #

Hi All,

I need to sample a vector ("old"), with replacement, up to the point 
where my vector of samples ("new") sums to a predefined value 
("target"), shortening the last sample if necessary so that the total 
sum ("newsum") of the samples matches the predefined value.

While I can easily do this with a "while" loop (see below for example 
code), because the length of both "old" and "new" may be > 20,000, a 
vectorized approach will save me lots of CPU time.

Any suggestions would be greatly appreciated.

Thanks, Dan

# loop approach
old=c(1:10)
p=runif(1:10)
target=20

newsum=0
new=NULL
while (newsum<target) {
    i=sample(old, size=1, prob=p);
    new[length(new)+1]=i;
    newsum=sum(new)
    }
new
newsum
target
if(newsum>target){new[length(new)]=target-sum(new[-length(new)])}
new
newsum=sum(new); newsum
target

Daniel E. Bunker
Associate Coordinator - BioMERGE
Post-Doctoral Research Scientist
Columbia University
Department of Ecology, Evolution and Environmental Biology
1020 Schermerhorn Extension
1200 Amsterdam Avenue
New York, NY 10027-5557

212-854-9881
212-854-8188 fax
deb37ATcolumbiaDOTedu

(Ted Harding)

Thu, Apr 7, 2005 2:46 PM #

On 07-Apr-05 Daniel E. Bunker wrote:

Hi Dan,
You should be able to adapt the following vectorised approach
to your specific needs:

  old<-0.001*(1:1000)
  new<-sample(old,10000,replace=TRUE,prob=p)
  target<-200
  min(which(cumsum(new)>target))

## [1] 385

This took only a fraction of a second on my medium-speed machine.
If you get an "Inf" as result, then 'new' doesn't add up to
'target', so you have to extend it.

Hoping this helps,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Apr-05                                       Time: 22:46:12
------------------------------ XFMail ------------------------------

Rich FitzJohn

Thu, Apr 7, 2005 2:47 PM #

Hi,

sample() takes a "replace" argument, so you can take large samples,
with replacement, like this: (In the sample() call, the
50*target/mean(old) should make it sample 50 times more than likely.
This means the while loop will probably get executed only once.  This
could be tuned easily, and there may be better ways of guessing how
much to take).

old <- c(1:2000)
p <- runif(1:2000)
target <- 4000
new <- 0

while ( sum(new) < target )
  new <- sample(old, 50*target/mean(old), TRUE, p)

i <- which(cumsum(new) >= target)[1]
new <- new[1:i]
new[i] <- new[i] - (sum(new)-target)

Cheers,
Rich

On Apr 8, 2005 9:19 AM, Daniel E. Bunker <deb37 at columbia.edu> wrote:

Rich FitzJohn
rich.fitzjohn <at> gmail.com   |    http://homepages.paradise.net.nz/richa183
                      You are in a maze of twisty little functions, all alike