Skip to content
Prev 392088 / 398503 Next

Ranger could not work with caret

Hello,

The error is in Ranger parameter mtry becoming greater than the number 
of variables (columns).
mtry can be set manually in caret::train argument tuneGrid. But for 
random forests you must also set the split rule and the minimum node.


library(caret)
library(farff)

boot <- trainControl(method = "cv", number = 10)

# set the maximum mtry manually to ncol(tr)
# this creates a sequence of mtry values
mtry <- var_seq(ncol(tr), len = 3)  # 3 is the default value
mtry
#  [1]  2 13 24
#[1]  2 13 24

splitrule <- c("variance", "extratrees")
min.node.size <- 1:10
mtrygrid <- expand.grid(mtry, splitrule, min.node.size)
names(mtrygrid) <- c("mtry", "splitrule", "min.node.size")

c1 <- train(act_effort ~ ., data = tr,
            method = "ranger",
            tuneLength = 5,
            metric = "MAE",
            preProc = c("center", "scale", "nzv"),
            tuneGrid = mtrygrid,
            trControl = boot)
c1
#  Random Forest
#
#  30 samples
#  23 predictors
#
#  Pre-processing: centered (48), scaled (48), remove (58)
#  Resampling: Cross-Validated (10 fold)
#  Summary of sample sizes: 28, 27, 27, 28, 27, 27, ...
#  Resampling results across tuning parameters:
#
#    mtry  splitrule   min.node.size  RMSE      Rsquared   MAE
#     2    variance     1             256.6391  0.8103759  186.3609
#     2    variance     2             249.7120  0.8628109  183.6696
#     2    variance     3             258.8240  0.8284449  189.0712
#
# [...omit...]
#
#    13    extratrees  10             254.9569  0.8918014  191.2524
#    24    variance     1             177.7188  0.9458652  112.2800
#    24    variance     2             172.6826  0.9204287  108.5943
#    24    variance     3             172.9954  0.9271006  109.2554
#    24    variance     4             172.2467  0.9523067  110.0776
#    24    variance     5             175.2485  0.9283317  112.8798
#    24    variance     6             177.9285  0.9369881  115.8970
#    24    variance     7             180.5959  0.9485035  117.5816
#    24    variance     8             178.8037  0.9358033  117.8725
#    24    variance     9             176.5849  0.9210959  117.0055
#    24    variance    10             178.6439  0.9257969  119.8035
#    24    extratrees   1             219.1368  0.8801770  141.0720
#    24    extratrees   2             216.1900  0.8550002  140.9263
#    24    extratrees   3             212.4138  0.8979379  141.4282
#    24    extratrees   4             218.2631  0.9121471  146.2908
#    24    extratrees   5             212.5679  0.9279598  144.2715
#    24    extratrees   6             218.9856  0.9141754  152.2099
#    24    extratrees   7             222.8540  0.9412682  152.4614
#    24    extratrees   8             228.1156  0.9423414  161.8456
#    24    extratrees   9             226.6182  0.9408306  160.5264
#    24    extratrees  10             226.9280  0.9429413  165.6878
#
#  MAE was used to select the optimal model using the smallest value.
#  The final values used for the model were mtry = 24, splitrule = variance
#   and min.node.size = 2.
plot(c1)



Hope this helps,

Rui Barradas


?s 23:03 de 30/06/2022, Neha gupta escreveu: