training svm

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080306/d3d11436/attachment.pl
A rather technical workaround I see could be adding a row with a
different value. But if a column only ever has one value, then it
contributes nothing to the model and I see no reason why it would have
to be kept.
~ Oldrich Kruza

On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
What should I do if I need to train svm() with data having same value across
all rows in some columns. These must be the important features of the class
and we cant exclude these columns to build up models.

The error I am getting is:
Error in predict.svm(ret, xhold) : Model is empty!
In addition: Warning message:
In svm.default(datatrain, classtrain) :
 Variable(s) 'F112' and 'F113'.... [... truncated]

Is there any way to overcome this problem? Any suggestions would be highly
helpful.

Regards
Soumyadeep

 ________________________________
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
now.
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080306/2c98a7a6/attachment.pl
Hello Soumyadeep,

if you store the data in a tabular file, then I suggest using standard
text-editing tools like cut (say your file is called data.csv, fields
are separated with commas and you want to get rid of the third and
sixth column):

$ cut --complement --delimiter="," --fields=3,6 < data.csv > data_cut.csv

If you're not in an Unix environment but have perl, then you may use a
script like:

 open SRC, "data.csv" or die("couldn't open source");
 open DST, ">data_cut.csv" or die("couldn't open destination");
 while (<SRC>) {
     chomp;
     @fields = split /,/;    #substitute the comma for the delimiter you use
     splice @fields, 2, 1;    #get rid of third column (they're
zero-based, thus 2 instead of 3)
     splice @fields, 5, 1;    #get rid of sixth column
     print DST join(",", @fields), "\n";
 }

If you need to do the selection within R, then you can do it by
indexing the data structure. Suppose you have the data in a data.frame
called data. Then:
data <- data[,-6]
data <- data[,-3]
might do the trick (but since I'm not much of an R hacker, this is
without guarantee). I think it might be better however to do the
preprocessing before the data get into R because then you avoid
loading the columns to discard into memory.

Hope this helps
~ Oldrich

On Fri, Mar 7, 2008 at 7:55 AM, Soumyadeep nandi
Thanks Oldrich,
 Actually I was not sure if I can remove these columns and build model.
Thanks a lot for your kind suggestion. Could you tell me if there any
function to remove these columns from the data matrix.

 With best regards,
 Soumyadeep

Oldrich Kruza <sixtease at gmail.com> wrote:
 A rather technical workaround I see could be adding a row with a
different value. But if a column only ever has one value, then it
contributes nothing to the model and I see no reason why it would have
to be kept.
~ Oldrich Kruza

On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
 wrote:
What should I do if I need to train svm() with data having same value
across
all rows in some columns. These must be the important features of the
class
and we cant exclude these columns to build up models.

The error I am getting is:
Error in predict.svm(ret, xhold) : Model is empty!
In addition: Warning message:
In svm.default(datatrain, classtrain) :
Variable(s) 'F112' and 'F113'.... [... truncated]

Is there any way to overcome this problem? Any suggestions would be highly
helpful.

Regards
Soumyadeep

________________________________
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
now.

 ________________________________
Looking for last minute shopping deals? Find them fast with Yahoo! Search.

Hello Soumyadeep,

if you store the data in a tabular file, then I suggest using standard
text-editing tools like cut (say your file is called data.csv, fields
are separated with commas and you want to get rid of the third and
sixth column):

$ cut --complement --delimiter="," --fields=3,6 < data.csv >  
data_cut.csv

If you're not in an Unix environment but have perl, then you may use a
script like:

 open SRC, "data.csv" or die("couldn't open source");
 open DST, ">data_cut.csv" or die("couldn't open destination");
 while (<SRC>) {
     chomp;
     @fields = split /,/;    #substitute the comma for the  
delimiter you use
     splice @fields, 2, 1;    #get rid of third column (they're
zero-based, thus 2 instead of 3)
     splice @fields, 5, 1;    #get rid of sixth column
     print DST join(",", @fields), "\n";
 }

If you need to do the selection within R, then you can do it by
indexing the data structure. Suppose you have the data in a data.frame
called data. Then:

data <- data[,-6]
data <- data[,-3]
might do the trick (but since I'm not much of an R hacker, this is
without guarantee). I think it might be better however to do the
preprocessing before the data get into R because then you avoid
loading the columns to discard into memory.
I am guessing that the data is already in R, so it should be easier  
to do it in R, especially if he doesn't know which columns are the  
ones with all identical values. For instance, suppose the data set is  
called x. Then the following would return TRUE for the columns that  
have all values the same:

allsame <- sapply(x,function(y) length(table(y))==1)

and then the following will take them out

newdata <- x[,!allsame]
Hope this helps
~ Oldrich
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College
Also, see the nearZeroVar function in the caret package.

MAx
On Mar 7, 2008, at 2:17 AM, Oldrich Kruza wrote:

 > Hello Soumyadeep,
 >
 > if you store the data in a tabular file, then I suggest using standard
 > text-editing tools like cut (say your file is called data.csv, fields
 > are separated with commas and you want to get rid of the third and
 > sixth column):
 >
 > $ cut --complement --delimiter="," --fields=3,6 < data.csv >
 > data_cut.csv
 >
 > If you're not in an Unix environment but have perl, then you may use a
 > script like:
 >
 >  open SRC, "data.csv" or die("couldn't open source");
 >  open DST, ">data_cut.csv" or die("couldn't open destination");
 >  while (<SRC>) {
 >      chomp;
 >      @fields = split /,/;    #substitute the comma for the
 > delimiter you use
 >      splice @fields, 2, 1;    #get rid of third column (they're
 > zero-based, thus 2 instead of 3)
 >      splice @fields, 5, 1;    #get rid of sixth column
 >      print DST join(",", @fields), "\n";
 >  }
 >
 > If you need to do the selection within R, then you can do it by
 > indexing the data structure. Suppose you have the data in a data.frame
 > called data. Then:
 >
 >> data <- data[,-6]
 >> data <- data[,-3]
 >
 > might do the trick (but since I'm not much of an R hacker, this is
 > without guarantee). I think it might be better however to do the
 > preprocessing before the data get into R because then you avoid
 > loading the columns to discard into memory.
 I am guessing that the data is already in R, so it should be easier
 to do it in R, especially if he doesn't know which columns are the
 ones with all identical values. For instance, suppose the data set is
 called x. Then the following would return TRUE for the columns that
 have all values the same:

 allsame <- sapply(x,function(y) length(table(y))==1)

 and then the following will take them out

 newdata <- x[,!allsame]

 > Hope this helps
 > ~ Oldrich
 Haris Skiadas
 Department of Mathematics and Computer Science
 Hanover College

 ______________________________________________
 R-help at r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Max