Skip to content

Recursive Feature Elimination with SVM

6 messages · Priyanka Purkayastha, David Winsemius, Bert Gunter

#
I have a dataset (data) with 700 rows and 7000 columns. I am trying to do
recursive feature selection with the SVM model. A quick google search
helped me get a code for a recursive search with SVM. However, I am unable
to understand the first part of the code, How do I introduce my dataset in
the code?

If the dataset is a matrix, named data. Please give me an example for
recursive feature selection with SVM. Bellow is the code I got for
recursive feature search.

    svmrfeFeatureRanking = function(x,y){

    #Checking for the variables
    stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)

    n = ncol(x)
    survivingFeaturesIndexes = seq_len(n)
    featureRankedList = vector(length=n)
    rankedFeatureIndex = n

    while(length(survivingFeaturesIndexes)>0){
    #train the support vector machine
    svmModel = svm(x[, survivingFeaturesIndexes], y, cost = 10,
cachesize=500,
                scale=FALSE, type="C-classification", kernel="linear" )

    #compute the weight vector
    w = t(svmModel$coefs)%*%svmModel$SV

    #compute ranking criteria
    rankingCriteria = w * w

    #rank the features
    ranking = sort(rankingCriteria, index.return = TRUE)$ix

    #update feature ranked list
    featureRankedList[rankedFeatureIndex] =
survivingFeaturesIndexes[ranking[1]]
    rankedFeatureIndex = rankedFeatureIndex - 1

    #eliminate the feature with smallest ranking criterion
    (survivingFeaturesIndexes = survivingFeaturesIndexes[-ranking[1]])}
    return (featureRankedList)}



I tried taking an idea from the above code and incorporate the idea in my
code as shown below

    library(e1071)
    library(caret)

    data<- read.csv("matrix.csv", header = TRUE)

    x <- data
    y <- as.factor(data$Class)

    svmrfeFeatureRanking = function(x,y){

      #Checking for the variables
      stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)

      n = ncol(x)
      survivingFeaturesIndexes = seq_len(n)
      featureRankedList = vector(length=n)
      rankedFeatureIndex = n

      while(length(survivingFeaturesIndexes)>0){
        #train the support vector machine
        svmModel = svm(x[, survivingFeaturesIndexes], y, cross=10,cost =
10, type="C-classification", kernel="linear" )

        #compute the weight vector
        w = t(svmModel$coefs)%*%svmModel$SV

        #compute ranking criteria
        rankingCriteria = w * w

        #rank the features
        ranking = sort(rankingCriteria, index.return = TRUE)$ix

        #update feature ranked list
        featureRankedList[rankedFeatureIndex] =
survivingFeaturesIndexes[ranking[1]]
        rankedFeatureIndex = rankedFeatureIndex - 1

        #eliminate the feature with smallest ranking criterion
        (survivingFeaturesIndexes = survivingFeaturesIndexes[-ranking[1]])}

      return (featureRankedList)}

But couldn't do anything at the stage "update feature ranked list"
Please guide
#
On 1/1/19 4:40 AM, Priyanka Purkayastha wrote:
Generally the "labels" is given to such a machine learning device as the 
y argument, while the "features" are passed as a matrix to the x argument.
#
Thankyou David.. I tried the same, I gave x as the data matrix and y as the
class label. But it returned an empty "featureRankedList". I get no output
when I try the code.

On Tue, 1 Jan 2019 at 11:42 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
#
On 1/1/19 5:31 PM, Priyanka Purkayastha wrote:
If you want people to spend time on this you should post a reproducible 
example. See the Posting Guide ... and learn to post in plain text.


--

David
#
This is the code I tried,

library(e1071)
library(caret)
library(ROCR)

data <- read.csv("data.csv", header = TRUE)
set.seed(998)

inTraining <- createDataPartition(data$Class, p = .70, list = FALSE)
training <- data[ inTraining,]
testing  <- data[-inTraining,]

while(length(data)>0){

## Building the model ####
svm.model <- svm(Class ~ ., data = training,
cross=10,metric="ROC",type="eps-regression",kernel="linear",na.action=na.omit,probability
= TRUE)
print(svm.model)


###### auc  measure #######

#prediction and ROC
svm.model$index
svm.pred <- predict(svm.model, testing, probability = TRUE)

#calculating auc
c <- as.numeric(svm.pred)
c = c - 1
pred <- prediction(c, testing$Class)
perf <- performance(pred,"tpr","fpr")
plot(perf,fpr.stop=0.1)
auc <- performance(pred, measure = "auc")
auc <- auc at y.values[[1]]
print(length(data))
print(auc)

#compute the weight vector
w = t(svm.model$coefs)%*%svm.model$SV

#compute ranking criteria
weight_matrix = w * w

#rank the features
w_transpose <- t(weight_matrix)
w2 <- as.matrix(w_transpose[order(w_transpose[,1], decreasing = FALSE),])
a <- as.matrix(w2[which(w2 == max(w2)),]) #to get the rows with minimum
values
row.names(a) -> remove
training<- data[,setdiff(colnames(data),remove)]
}

















On Wed, Jan 2, 2019 at 11:18 AM David Winsemius <dwinsemius at comcast.net>
wrote:

  
    
#
Note: **NOT** reproducible (only you have "data.csv").

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 1, 2019 at 11:14 PM Priyanka Purkayastha <
ppurkayastha2010 at gmail.com> wrote: