Skip to content
Prev 309807 / 398506 Next

sample equal number of cases per class

Hello,

Function caret::createDatapartition preserves the proportions of 
classes, like its documentation says, so you should expected the result 
to be balanced only if the original data.frame is also balanced. A 
solution is to write a small function that chooses a balanced set of 
indices. Note that ths function below does _not_ use the same arguments 
as caret::createDataPartition, its arguments are:

x - the original vector, matrix or data.frame.
y - a vector, what to balance.
p - proportion of x to choose.


createSets <- function(x, y, p){
     nr <- NROW(x)
     size <- (nr * p) %/% length(unique(y))
     idx <- lapply(split(seq_len(nr), y), function(.x) sample(.x, size))
     unlist(idx)
}
ind <- createSets(df, df$class, 0.8)
lrn <- df[ind,]
summary(lrn)


Also, 'df' is a bad name for a variable, it allready is an R function. 
Use, for instance, 'dat'.

Hope this helps,

Rui Barradas
Em 04-11-2012 10:47, ollestrat escreveu: