Hello All, I've a dataset of six samples and 1530 variables/features and wish to know the the importance of features. I'm trying to use the "Rank Features By Importance" as mentioned in Feature Selection with the Caret R Package (http://machinelearningmastery.com/feature-selection-with-the-caret-r-package/) I'm using the following code: ? ? rm(list=ls())? ? set.seed(12345)? ? library(mlbench)? ? library(caret)? ? options(error=utils::recover) ? ? #Pastebin link for Data: http://pastebin.com/raw/cg0Kiueq? ? mydata.df <- read.table("data.PasteBin.txt", header=TRUE,sep="\t",stringsAsFactors=TRUE)? ? dim(mydata.df) ? ? lvq.control <- trainControl(method="LOOCV")? ? lvq.model <- train(ID~., data=mydata.df, method="lvq", trControl=lvq.control ) #FAILS ? ? importance <- varImp(lvq.model, scale=FALSE)? ? print(importance)? ? plot(importance) The data can be downloaded from the following Pastebin link: http://pastebin.com/raw/cg0Kiueq The program fails to execute with the following error and debug messages: ? ? Error in seeds[[num_rs + 1L]] : subscript out of bounds? ? 1: train(ID ~ ., data = mydata.df, method = "lvq", trControl = lvq.control)? ? 2: train.formula(ID ~ ., data = mydata.df, method = "lvq", trControl = lvq.con? ? 3: train(x, y, weights = w, ...)? ? 4: train.default(x, y, weights = w, ...) I've read from multiple sources (http://davidhughjones.blogspot.com/2015/04/r-tip-caret-error.html) that unless the response variable is of class factor Caret issues error like this. ?However, my response variable('ID') is indeed a factor ? ? > str(mydata.df$ID)? ? ?Factor w/ 2 levels "NONRC","RC": 2 2 1 1 2 1 The detail of my version of R and Caret are as follows: ? ? > packageVersion("caret")? ? [1] ?6.0.70?? ? R version 3.3.0 (2016-05-03)? ? Platform: x86_64-w64-mingw32/x64 (64-bit)? ? Running under: Windows 7 x64 (build 7601) Service Pack 1 Can someone please suggest any remedy? Thanks in advance
Feature selection using R Caret package: Error in seeds[[num_rs + 1L]] : subscript out of bounds
2 messages · Its August, David Winsemius
On Dec 20, 2016, at 1:30 PM, Its August via R-help <r-help at r-project.org> wrote:
rm(list=ls()) set.seed(12345) library(mlbench) library(caret) options(error=utils::recover)
#Pastebin link for Data: http://pastebin.com/raw/cg0Kiueq mydata.df <- read.table("data.PasteBin.txt", header=TRUE,sep="\t",stringsAsFactors=TRUE) dim(mydata.df)
lvq.control <- trainControl(method="LOOCV") lvq.model <- train(ID~., data=mydata.df, method="lvq", trControl=lvq.control ) #FAILS
importance <- varImp(lvq.model, scale=FALSE) print(importance) plot(importance)
Posting in HTML causes this sort of unparseable code to appear on the distributed version of your posting. (You didn't read the Posting Guide.) After looking at the link: http://machinelearningmastery.com/feature-selection-with-the-caret-r-package/ ... I'm of the opinion that you are posting in the wrong place. I think the people that run that program should be the ones you ask for assistance. After all, they say: #--------- STOP Wasting Your Time Piecing Together One-Off Articles and Parsing Greek Letters in Academic Textbooks NOW is the time to actually learn? How To Deliver Results With Machine Learning #--------- Obviously their product is what you should be using.
David Winsemius Alameda, CA, USA