Skip to content

[caret package] [trainControl] supplying predefined partitions to train with cross validation

4 messages · Fabon Dzogang, neetika nath

#
Hi all,

I run R 2.11.1 under ubuntu 10.10 and caret version 2.88.

I use the caret package to compare different models on a dataset. In
order to compare their different performances I would like to use the
same data partitions for every models. I understand that using a LGOCV
or a boot type re-sampling method along with the "index" argument of
the trainControl function, one is able to supply a training partition
to the train function.

However, I would like to apply a 10-fold cross validation to validate
the models and I did not find any way to supply some predefined
partition (created with createFolds) in this setting. Any help ?

Thank you and great package by the way !

Fabon Dzogang.
#
Hello,

Thank you for your reply but I'm not sure your code answers my needs,
from what I read it creates a 10-fold partition and then extracts the
kth partition for future processing.

My question was rather: once I have a 10-fold partition of my data,
how to supply it to the "train" function of the caret package. Here's
some sample code :

folds <- createFolds(my_dataset_classes, 10)

# I can't use index=folds on this one, it will train on the 1/k and test on k-1
t_control <- trainControl(method="cv", number=10)

# here I would like train to take account of my predefined folds
model <- train(my_dataset_predictors, my_dataset_classes,
method="svmLinear", trControl = t_control)

Cheers,
Fabon.
On Fri, May 6, 2011 at 10:59 AM, neetika nath <nikkihathi at gmail.com> wrote:

  
    
4 days later
#
Here is an answer from Max Khun thank you !

Fabon,

If I understand the problem, there are two ways of doing it. First, if you
are using caret's trian(), rfe() or sbf(), if you set the seed right before
you call the models, they end up using the same resampled data sets. (btw,
if you use the resamples() function in caret, it checks for the same
resampling indices)

If you want to manually fix the data sets, there is an example in section
5.2 of

 http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf

Using LGOCV. For 10-fold CV, you can use createFolds() with an additional
argument:
$Fold01
[1] 2 3 4 5 6 7 8 9 10

$Fold02
[1]  1  3  4  5  6  7  8  9 10

$Fold03
[1]  1  2  4  5  6  7  8  9 10

$Fold04
[1]  1  2  3  5  6  7  8  9 10

$Fold05
[1]  1  2  3  4  6  7  8  9 10

$Fold06
[1]  1  2  3  4  5  7  8  9 10

$Fold07
[1]  1  2  3  4  5  6  8  9 10

$Fold08
[1]  1  2  3  4  5  6  7  9 10

$Fold09
[1]  1  2  3  4  5  6  7  8 10

$Fold10
[1] 1 2 3 4 5 6 7 8 9

For the trainControl() function, the index argument should be a list of
samples indices for each resample. So if I give it the the above results of
createFolds(), it will do 10-fold cv.

MAx
On Fri, May 6, 2011 at 12:32 PM, Fabon Dzogang <fabon.dzogang at lip6.fr> wrote: