sampling from data.frame
On Wed, 3 Dec 2008, axionator wrote:
Hi all, I have a data frame with "clustered" rows as follows: Cu1 x1 y1 z1 ... Cu1 x2 y2 z2 ... Cu1 x3 y3 z3 ... # end of first cluster Cu1 Cu2 x4 y4 z4 ... Cu2 x5 y5 z5 Cu2 ... # end of second cluster Cu2 Cu3 ... ... "cluster"-size is 3 in the example above (rows making up a cluster are always consecutive). Is there any faster way to sample n clusters (with replacement) from this dataframe and build up a new data frame out of these sampled clusters? I use the "sample" function and a for-loop.
Something like this: cl.samps <- sample( split( df, df$cluster ), n.samps, repl=TRUE ) do.call( rbind, cl.samps ) If you need to identify the samples from which the rows came (versus just the originating clusters): cl.samps2 <- lapply( seq(along=cl.samps), function(x) cbind( cl.samps[[ x ]], new.cluster = x ) ) do.call( rbind, cl.samps2 ) HTH, Chuck
Thanks in advance Armin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901