Skip to content
Prev 313952 / 398506 Next

create stratified splits

Hi Martin,

Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:

# create the vector to be split
r <- runif(100)

# write a function to split it, with various knobs and toggles
splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500, verbose=FALSE) {
  M <- mean.tol+1
  SD <- sd.tol+1
  I <- 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
  while((M > mean.tol | SD > sd.tol) & I <= itr) {
    I <- I + 1
    ## pick another split
    x1 <- data.frame(g = rep(letters[1:n], length(x)/n),
                     value = sample(x, length(x)))
    M <- sd(tapply(x1$value, x1$g, FUN=mean))
    SD <- sd(tapply(x1$value, x1$g, FUN=sd))
    if(verbose) {
      cat("M = ", M, ", mean.tol =", mean.tol, ": SD = ", SD, ",
sd.tol=", sd.tol, "\n")
    }
  }
# don't try forever...
  if(I >= itr) {
    stop("failed to find split matching criteria: try increasing tolerance")
  } else {
    return(x1)
  }
}

# now use our function to find a set of splits within our mean and sd
tolerance.
tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)

# adjust some of the dials and switches to suit...
tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)

Best,
Ista

On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
<batholdy at googlemail.com> wrote: