svm
Hi Amy,
On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
Hi Steve,
Thank you very much for your reply.
I?m trying to do something systematic/general in the program so that I can
try different datasets without changing much in the program (without knowing
the name of the class label that has different name from dataset to
another?)
Could you please tell me your opinion about this code:-
library(e1071)
mydata<-read.delim("the_whole_dataset.txt")
class_label <- names(mydata)[1]??????????????????????? # I?ll always put the
class label in the first column.
myformula <- formula(paste(class_label,"~ ."))
x <- subset(mydata, select = - mydata[, 1])
mymodel<-(svm(myformula, x, cross=3))
summary(model)
################
Since you're not doing anything funky with the formula, a preference
of mine is to just skip this way of calling SVM and go "straight" to
the svm(x,y,...) method:
R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
R> train.x <- mydata[,-1]
R> train.y <- mydata[,1]
R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
## or
R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")
As an aside, I also like to be explicit about the type="" parameter to
tell what I want my SVM to do (regression or classification). If it's
not specified, the SVM picks which one to do based on whether or not
your y vector is a vector of factors (does classification), or not
(does regression)
Do I have to the same steps with testingset? i.e. the testing set must not contain the label too? But contains the same structure as the training set? Is it correct?
I guess you'll want to report your accuracy/MSE/something on your
model for your testing set? Just load the data in the same way then
use `predict` to calculate the metric your after. You'll have to have
the labels for your data to do that, though, eg:
testdata <- as.matrix(read.delim('testdata.txt'))
test.x <- testdata[,-1]
test.y <- testdata[,1]
preds <- predict(mymodel, test.x)
Let's assume you're doing classification, so let's report the accuracy:
acc <- sum(preds == test.y) / length(test.y)
Does that help?
-steve
Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact