Hello All,
I've a dataset of six samples and 1530 variables/features and wish to know the the importance of features. I'm trying to use the "Rank Features By Importance" as mentioned in Feature Selection with the Caret R Package (http://machinelearningmastery.com/feature-selection-with-the-caret-r-package/)
I'm using the following code:
? ? rm(list=ls())? ? set.seed(12345)? ? library(mlbench)? ? library(caret)? ? options(error=utils::recover)
? ? #Pastebin link for Data: http://pastebin.com/raw/cg0Kiueq? ? mydata.df <- read.table("data.PasteBin.txt", header=TRUE,sep="\t",stringsAsFactors=TRUE)? ? dim(mydata.df)
? ? lvq.control <- trainControl(method="LOOCV")? ? lvq.model <- train(ID~., data=mydata.df, method="lvq", trControl=lvq.control ) #FAILS
? ? importance <- varImp(lvq.model, scale=FALSE)? ? print(importance)? ? plot(importance)
The data can be downloaded from the following Pastebin link:
http://pastebin.com/raw/cg0Kiueq
The program fails to execute with the following error and debug messages:
? ? Error in seeds[[num_rs + 1L]] : subscript out of bounds? ? 1: train(ID ~ ., data = mydata.df, method = "lvq", trControl = lvq.control)? ? 2: train.formula(ID ~ ., data = mydata.df, method = "lvq", trControl = lvq.con? ? 3: train(x, y, weights = w, ...)? ? 4: train.default(x, y, weights = w, ...)
I've read from multiple sources (http://davidhughjones.blogspot.com/2015/04/r-tip-caret-error.html) that unless the response variable is of class factor Caret issues error like this.
?However, my response variable('ID') is indeed a factor
? ? > str(mydata.df$ID)? ? ?Factor w/ 2 levels "NONRC","RC": 2 2 1 1 2 1
The detail of my version of R and Caret are as follows:
? ? > packageVersion("caret")? ? [1] ?6.0.70?? ? R version 3.3.0 (2016-05-03)? ? Platform: x86_64-w64-mingw32/x64 (64-bit)? ? Running under: Windows 7 x64 (build 7601) Service Pack 1
Can someone please suggest any remedy?
Thanks in advance