create stratified splits
Hi Martin,
Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:
# create the vector to be split
r <- runif(100)
# write a function to split it, with various knobs and toggles
splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500, verbose=FALSE) {
M <- mean.tol+1
SD <- sd.tol+1
I <- 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
while((M > mean.tol | SD > sd.tol) & I <= itr) {
I <- I + 1
## pick another split
x1 <- data.frame(g = rep(letters[1:n], length(x)/n),
value = sample(x, length(x)))
M <- sd(tapply(x1$value, x1$g, FUN=mean))
SD <- sd(tapply(x1$value, x1$g, FUN=sd))
if(verbose) {
cat("M = ", M, ", mean.tol =", mean.tol, ": SD = ", SD, ",
sd.tol=", sd.tol, "\n")
}
}
# don't try forever...
if(I >= itr) {
stop("failed to find split matching criteria: try increasing tolerance")
} else {
return(x1)
}
}
# now use our function to find a set of splits within our mean and sd
tolerance.
tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)
# adjust some of the dials and switches to suit...
tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)
Best,
Ista
On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
<batholdy at googlemail.com> wrote:
Hi, I have a vector like: r <- runif(100) Now I would like to split r into 10 pieces (each with 10 elements) ? but the 'pieces' should be roughly similar with regard to mean and sd. what is an efficient way to do this in R? thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.