svm

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100106/a8e91423/attachment.pl>
Hi,
Hi,

I understand from help pages that in order to use a data set with svm, I have to divide it into two files: one for the dataset without the class label and the other file contains the class label as the following code:-
This isn't exactly correct ... look at the examples in the ?svm
documentation a bit closer.
library(e1071)
x<- read.delim("mydataset_except-class-label.txt")
y<- read.delim("mydataset_class-labell.txt")
model <- svm(x, y, cross=5)
summary(model)

but I couldn?t understand how I add ?formula? parameter to it? Does formula contain the class label too??
Using the first example in ?svm

attach(iris)
model <- svm(Species ~ ., data = iris)

The first argument in the function call is the formula. The "Species"
column is the class label.

`iris` is a data.frame ... you can see that it has the label *in it*,
look at the output of "head(iris)
and what I have to do to use testing set when I don?t use ?cross? parameter.
Just follow the example in ?svm some more, you'll see training a model
and then testing it on data. The example happens to be the same data
the model trained on. To use new data, you'll just need a data
matrix/data.frame with as many columns as your original data, and as
many rows as you have observations.

The first step separates the labels from the data (you can do the same
in  your data -- you don't have to have separate test and train files
that are different -- just remove the labels from it in R):

attach(iris)
x <- subset(iris, select = -Species)
y <- Species
model <- svm(x, y)

# test with train data
pred <- predict(model, x)

Hope that helps,
-steve
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100106/cd8ad751/attachment.pl>
Hi Amy,
Hi Steve,

Thank you very much for your reply.

I?m trying to do something systematic/general in the program so that I can
try different datasets without changing much in the program (without knowing
the name of the class label that has different name from dataset to
another?)

Could you please tell me your opinion about this code:-

library(e1071)

mydata<-read.delim("the_whole_dataset.txt")

class_label <- names(mydata)[1]??????????????????????? # I?ll always put the
class label in the first column.

myformula <- formula(paste(class_label,"~ ."))

x <- subset(mydata, select = - mydata[, 1])

mymodel<-(svm(myformula, x, cross=3))

summary(model)

################
Since you're not doing anything funky with the formula, a preference
of mine is to just skip this way of calling SVM and go "straight" to
the svm(x,y,...) method:

R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
R> train.x <- mydata[,-1]
R> train.y <- mydata[,1]

R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
## or
R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")

As an aside, I also like to be explicit about the type="" parameter to
tell what I want my SVM to do (regression or classification). If it's
not specified, the SVM picks which one to do based on whether or not
your y vector is a vector of factors (does classification), or not
(does regression)
Do I have to the same steps with testingset? i.e. the testing set must not
contain the label too? But contains the same structure as the training set?
Is it correct?
I guess you'll want to report your accuracy/MSE/something on your
model for your testing set? Just load the data in the same way then
use `predict` to calculate the metric your after. You'll have to have
the labels for your data to do that, though, eg:

testdata <- as.matrix(read.delim('testdata.txt'))
test.x <- testdata[,-1]
test.y <- testdata[,1]
preds <- predict(mymodel, test.x)

Let's assume you're doing classification, so let's report the accuracy:

acc <- sum(preds == test.y) / length(test.y)

Does that help?
-steve
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100108/c149affc/attachment.pl>
Hi,
Hi Steve,

Thank you very much for your reply. Your code is more readable and obvious than mine?
No Problem.
Could you please help me in these questions?:

1) ?Formula? is an alternative to ?y? parameter in SVM. is it correct?
No, that's not correct.

There are two svm functions, one that takes a "formula" object
(svm.formula), and one that takes an x matrix, and a y vector
(svm.default). The svm.formula function is called when the first
argument in your "svm(..)" call is a formula object. This function
simply parses the formula and manipulates your data object into an x
matrix and y vector, then calls the svm.default function with those
params ... I usually prefer to just skip the formula and provide the x
and y objects directly.

Load the e1071 library and look at the source code:

R> library(e1071)
R> e1071:::svm.formula

You'll see what I mean.
2) I forgot to remove the ?class label? from the dataset besides I gave the
program the class label in formula parameter but the program works! Could
you please clarify this point to me?
The author of the e1071 package did you a favor. The predict.svm
function checks to see if your svm object was built using the formula
interface .. if so, it looks for you label column in the data you are
trying to predict on and ignores it.

Look at the function's source code (eg, type e1071:::predict.svm at
the R prompt), and look for the call to the delete.response function
... you can also look at the help in ?delete.response.

-steve
Date: Wed, 6 Jan 2010 18:44:13 -0500
Subject: Re: [R] svm
From: mailinglist.honeypot at gmail.com
To: amy_4_5_84 at hotmail.com
CC: r-help at r-project.org

Hi Amy,

On Wed, Jan 6, 2010 at 4:33 PM, Amy Hessen <amy_4_5_84 at hotmail.com> wrote:
Hi Steve,

Thank you very much for your reply.

I?m trying to do something systematic/general in the program so that I
can
try different datasets without changing much in the program (without
knowing
the name of the class label that has different name from dataset to
another?)

Could you please tell me your opinion about this code:-

library(e1071)

mydata<-read.delim("the_whole_dataset.txt")

class_label <- names(mydata)[1]??????????????????????? # I?ll always put
the
class label in the first column.

myformula <- formula(paste(class_label,"~ ."))

x <- subset(mydata, select = - mydata[, 1])

mymodel<-(svm(myformula, x, cross=3))

summary(model)

################
Since you're not doing anything funky with the formula, a preference
of mine is to just skip this way of calling SVM and go "straight" to
the svm(x,y,...) method:

R> mydata <- as.matrix(read.delim("the_whole_dataset.txt"))
R> train.x <- mydata[,-1]
R> train.y <- mydata[,1]

R> mymodel <- svm(train.x, train.y, cross=3, type="C-classification")
## or
R> mymodel <- svm(train.x, train.y, cross=3, type="eps-regression")

As an aside, I also like to be explicit about the type="" parameter to
tell what I want my SVM to do (regression or classification). If it's
not specified, the SVM picks which one to do based on whether or not
your y vector is a vector of factors (does classification), or not
(does regression)

Do I have to the same steps with testingset? i.e. the testing set must
not
contain the label too? But contains the same structure as the training
set?
Is it correct?
I guess you'll want to report your accuracy/MSE/something on your
model for your testing set? Just load the data in the same way then
use `predict` to calculate the metric your after. You'll have to have
the labels for your data to do that, though, eg:

testdata <- as.matrix(read.delim('testdata.txt'))
test.x <- testdata[,-1]
test.y <- testdata[,1]
preds <- predict(mymodel, test.x)

Let's assume you're doing classification, so let's report the accuracy:

acc <- sum(preds == test.y) / length(test.y)

Does that help?
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

________________________________
Sell your old one fast! Time for a new car?

Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100112/82471342/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100124/0ce2ba11/attachment.pl>

Hi,

Could you please tell me whether there are feature selection algorithms 
in R or not such as genetic algorithms? If so, could you please tell me 
in which package?
I can!

By following the _posting guide_, I see in the 'Do Your Homework' section 
that I should try something like:

 	RSiteSearch("feature selection")

and

 	RSiteSearch("genetic algorithm")

And each seems to produce lots of good candidates!

HTH,

Chuck

p.s. Don't forget to check the Tasks Views on CRAN
Cheers,
Amy
_________________________________________________________________
View photos of singles in your area! Browse profiles for FREE

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
You can check

http://cran.r-project.org/web/views/MachineLearning.html

Carlos J. Gil Bellosta
http://www.datanalytics.com

Hi,

Could you please tell me whether there are feature selection algorithms in R or not such as genetic algorithms? If so, could you please tell me in which package?

Cheers,
Amy