linear classifiers with sparse matrices

Thu, Oct 6, 2011 2:31 PM

I've been trying to get some linear classifiers (LiblineaR, kernlab,
e1071) to work with a sparse matrix of feature data.  In the case of
LiblineaR and kernlab, it seems I have to coerce my data into a dense
matrix in order to train a model.  I've done a number of searches,
read through the manuals and vignettes, but I can't seem to see how to
use either of these packages with sparse matrices.  I've tried using
both csr from SparseM and sparseMatrix from the Matrix library.  You
can see a simple example recreating my results below.

Does anybody know if there's a trick to get this to work without
coercing the data into a dense matrix?

I'm currently playing with the KDDCUP 2010 datasets.  I've written a
simple script to create hash kernel feature vectors for each of the
rows of training data.  Right now I haven't added many features into
the hash vectors.  For simplicity, I'm just creating a string token
for each feature, then hashing it and taking that hash mod 10007 and
10009 (so two buckets for each feature with a low likelihood of two
features colliding on both buckets).  10009 columns may seem like
overkill, but I figured if it was a sparse matrix the number of
columns really wouldn't matter that much.  Right now I'm also only
playing with 99999 rows of input.  When ever I make the mistake of
doing something which unintentionally coerces the sparse matrix into a
dense one, I end up eating up all my RAM, going to swap, and spending
the next 5 minutes trying to kill my session...  So I'm looking for
something that scales relatively well without taking up too large a
memory footprint to run.

Thanks!
Jeff

See below for an example that recreates what my basic attempts at
using sparse matrices

? ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
[2,] ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 1
[3,] ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
[4,] ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 1
[5,] ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
[6,] ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 1
[7,] ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
[8,] ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 1
[9,] ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
[10,] ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 1

[1] 0 1 0 1 0 1 0 1 0 1

$predictions
 [1] 0 1 0 1 0 1 0 1 0 1

Error in t.default(data) : argument is not a matrix

Error in t.default(data) : argument is not a matrix
 Setting default kernel parameters

[,1]
 [1,]  0.1
 [2,]  0.9
 [3,]  0.1
 [4,]  0.9
 [5,]  0.1
 [6,]  0.9
 [7,]  0.1
 [8,]  0.9
 [9,]  0.1
[10,]  0.9

Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "ksvm", for
signature "dgCMatrix"

Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "ksvm", for
signature "matrix.csr"