Message-ID: <Pine.LNX.4.64.0812021714260.14269@tajo.ucsd.edu>
Date: 2008-12-03T01:19:12Z
From: Charles C. Berry
Subject: sampling from data.frame
In-Reply-To: <97a146780812021527w33802dean9aca3928d0347b00@mail.gmail.com>
On Wed, 3 Dec 2008, axionator wrote:
> Hi all,
> I have a data frame with "clustered" rows as follows:
> Cu1 x1 y1 z1 ...
> Cu1 x2 y2 z2 ...
> Cu1 x3 y3 z3 ... # end of first cluster Cu1
> Cu2 x4 y4 z4 ...
> Cu2 x5 y5 z5
> Cu2 ... # end of second cluster Cu2
> Cu3 ...
> ...
> "cluster"-size is 3 in the example above (rows making up a cluster are
> always consecutive). Is there any faster way to sample n clusters
> (with replacement) from this dataframe and build up a new data frame
> out of these sampled clusters? I use the "sample" function and a
> for-loop.
Something like this:
cl.samps <- sample( split( df, df$cluster ), n.samps, repl=TRUE )
do.call( rbind, cl.samps )
If you need to identify the samples from which the rows came (versus just
the originating clusters):
cl.samps2 <- lapply( seq(along=cl.samps),
function(x) cbind( cl.samps[[ x ]], new.cluster = x ) )
do.call( rbind, cl.samps2 )
HTH,
Chuck
>
> Thanks in advance
> Armin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901