An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100106/a8e91423/attachment.pl>
svm
10 messages · Amy Hessen, Steve Lianoglou, Charles C. Berry +1 more
Hi,
On Tue, Jan 5, 2010 at 7:01 PM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
Hi, I understand from help pages that in order to use a data set with svm, I have to divide it into two files: one for the dataset without the class label and the other file contains the class label as the following code:-
This isn't exactly correct ... look at the examples in the ?svm documentation a bit closer.
library(e1071)
x<- read.delim("mydataset_except-class-label.txt")
y<- read.delim("mydataset_class-labell.txt")
model <- svm(x, y, cross=5)
summary(model)
but I couldn?t understand how I add ?formula? parameter to it? Does formula contain the class label too??
Using the first example in ?svm attach(iris) model <- svm(Species ~ ., data = iris) The first argument in the function call is the formula. The "Species" column is the class label. `iris` is a data.frame ... you can see that it has the label *in it*, look at the output of "head(iris)
and what I have to do to use testing set when I don?t use ?cross? parameter.
Just follow the example in ?svm some more, you'll see training a model and then testing it on data. The example happens to be the same data the model trained on. To use new data, you'll just need a data matrix/data.frame with as many columns as your original data, and as many rows as you have observations. The first step separates the labels from the data (you can do the same in your data -- you don't have to have separate test and train files that are different -- just remove the labels from it in R): attach(iris) x <- subset(iris, select = -Species) y <- Species model <- svm(x, y) # test with train data pred <- predict(model, x) Hope that helps, -steve
Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100106/cd8ad751/attachment.pl>
Hi Amy,
On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
Hi Steve,
Thank you very much for your reply.
I?m trying to do something systematic/general in the program so that I can
try different datasets without changing much in the program (without knowing
the name of the class label that has different name from dataset to
another?)
Could you please tell me your opinion about this code:-
library(e1071)
mydata<-read.delim("the_whole_dataset.txt")
class_label <- names(mydata)[1]??????????????????????? # I?ll always put the
class label in the first column.
myformula <- formula(paste(class_label,"~ ."))
x <- subset(mydata, select = - mydata[, 1])
mymodel<-(svm(myformula, x, cross=3))
summary(model)
################
Since you're not doing anything funky with the formula, a preference
of mine is to just skip this way of calling SVM and go "straight" to
the svm(x,y,...) method:
R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
R> train.x <- mydata[,-1]
R> train.y <- mydata[,1]
R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
## or
R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")
As an aside, I also like to be explicit about the type="" parameter to
tell what I want my SVM to do (regression or classification). If it's
not specified, the SVM picks which one to do based on whether or not
your y vector is a vector of factors (does classification), or not
(does regression)
Do I have to the same steps with testingset? i.e. the testing set must not contain the label too? But contains the same structure as the training set? Is it correct?
I guess you'll want to report your accuracy/MSE/something on your
model for your testing set? Just load the data in the same way then
use `predict` to calculate the metric your after. You'll have to have
the labels for your data to do that, though, eg:
testdata <- as.matrix(read.delim('testdata.txt'))
test.x <- testdata[,-1]
test.y <- testdata[,1]
preds <- predict(mymodel, test.x)
Let's assume you're doing classification, so let's report the accuracy:
acc <- sum(preds == test.y) / length(test.y)
Does that help?
-steve
Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100108/c149affc/attachment.pl>
1 day later
Hi,
On Fri, Jan 8, 2010 at 11:57 AM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
Hi Steve, Thank you very much for your reply. Your code is more readable and obvious than mine?
No Problem.
Could you please help me in these questions?: 1) ?Formula? is an alternative to ?y? parameter in SVM. is it correct?
No, that's not correct. There are two svm functions, one that takes a "formula" object (svm.formula), and one that takes an x matrix, and a y vector (svm.default). The svm.formula function is called when the first argument in your "svm(..)" call is a formula object. This function simply parses the formula and manipulates your data object into an x matrix and y vector, then calls the svm.default function with those params ... I usually prefer to just skip the formula and provide the x and y objects directly. Load the e1071 library and look at the source code: R> library(e1071) R> e1071:::svm.formula You'll see what I mean.
2) I forgot to remove the ?class label? from the dataset besides I gave the program the class label in formula parameter but the program works! Could you please clarify this point to me?
The author of the e1071 package did you a favor. The predict.svm function checks to see if your svm object was built using the formula interface .. if so, it looks for you label column in the data you are trying to predict on and ignores it. Look at the function's source code (eg, type e1071:::predict.svm at the R prompt), and look for the call to the delete.response function ... you can also look at the help in ?delete.response. -steve
Date: Wed, 6 Jan 2010 18:44:13 -0500 Subject: Re: [R] svm From: mailinglist.honeypot at gmail.com To: amy_4_5_84 at hotmail.com CC: r-help at r-project.org Hi Amy, On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
Hi Steve,
Thank you very much for your reply.
I?m trying to do something systematic/general in the program so that I
can
try different datasets without changing much in the program (without
knowing
the name of the class label that has different name from dataset to
another?)
Could you please tell me your opinion about this code:-
library(e1071)
mydata<-read.delim("the_whole_dataset.txt")
class_label <- names(mydata)[1]??????????????????????? # I?ll always put
the
class label in the first column.
myformula <- formula(paste(class_label,"~ ."))
x <- subset(mydata, select = - mydata[, 1])
mymodel<-(svm(myformula, x, cross=3))
summary(model)
################
Since you're not doing anything funky with the formula, a preference
of mine is to just skip this way of calling SVM and go "straight" to
the svm(x,y,...) method:
R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
R> train.x <- mydata[,-1]
R> train.y <- mydata[,1]
R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
## or
R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")
As an aside, I also like to be explicit about the type="" parameter to
tell what I want my SVM to do (regression or classification). If it's
not specified, the SVM picks which one to do based on whether or not
your y vector is a vector of factors (does classification), or not
(does regression)
Do I have to the same steps with testingset? i.e. the testing set must not contain the label too? But contains the same structure as the training set? Is it correct?
I guess you'll want to report your accuracy/MSE/something on your
model for your testing set? Just load the data in the same way then
use `predict` to calculate the metric your after. You'll have to have
the labels for your data to do that, though, eg:
testdata <- as.matrix(read.delim('testdata.txt'))
test.x <- testdata[,-1]
test.y <- testdata[,1]
preds <- predict(mymodel, test.x)
Let's assume you're doing classification, so let's report the accuracy:
acc <- sum(preds == test.y) / length(test.y)
Does that help?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
________________________________ Sell your old one fast! Time for a new car?
Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
2 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100112/82471342/attachment.pl>
11 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100124/0ce2ba11/attachment.pl>
On Sun, 24 Jan 2010, Amy Hessen wrote:
Hi, Could you please tell me whether there are feature selection algorithms in R or not such as genetic algorithms? If so, could you please tell me in which package?
I can!
By following the _posting guide_, I see in the 'Do Your Homework' section
that I should try something like:
RSiteSearch("feature selection")
and
RSiteSearch("genetic algorithm")
And each seems to produce lots of good candidates!
HTH,
Chuck
p.s. Don't forget to check the Tasks Views on CRAN
Cheers, Amy
_________________________________________________________________ View photos of singles in your area! Browse profiles for FREE [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
You can check http://cran.r-project.org/web/views/MachineLearning.html Carlos J. Gil Bellosta http://www.datanalytics.com
Amy Hessen wrote:
Hi, Could you please tell me whether there are feature selection algorithms in R or not such as genetic algorithms? If so, could you please tell me in which package? Cheers, Amy