Skip to content

SVM classification based on pairwise distance matrix

5 messages · Martin Tomko, Steve Lianoglou

#
Hi,
On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomko <martin.tomko at geo.uzh.ch> wrote:
It seems to me that since you have some pairwise distance metric, your
original data is in some "vector form".

Why not just try using your original data (forget the pairwsise
distance for now) and try a few different kernels for the svm, such as
a linear kernel or an rbf/gaussian.
I guess you can think of a "kernel matrix" as something like a
distance matrix -- actually, it's more like a similarity matrix.

I don't recall if e1071 allows you to use kernel matrix as input, but
I'm pretty sure the svm functions from kernlab do. It was a pain to
use, though.

But anyway -- don't use your distance matrix :-)
With the exception of "plugging in" a kernel matrix (which was
calculated from data in its original feature space) that's pretty much
correct.
But your distance matrix isn't really the same multidemensional space
your data lives in, right?

Anyway, like I said before, try the SVM on your original data with
some different kernels. I think the RBF kernel should be closest in
spirit to your distance matrix, and will likely perform better than
your kNN ;-).

Hope that helps,
-steve
#
Hi Steve,
tahnks for the hints and clarifications.
Unfortunately, I will not be able to use the approach you suggest, The 
distances I generate are distances between VERY large matrices (say 
100000x100000 and more) each  of different dimensions (not necessarily 
square either), and there is no significance in terms of column 
properties, they are basically graphs of sort.

Is there a way out with the SVM, or I just forget that?
Martin
On 10/21/2010 5:42 PM, Steve Lianoglou wrote:
#
Hi,
On Thu, Oct 21, 2010 at 12:12 PM, Martin Tomko <martin.tomko at geo.uzh.ch> wrote:
Well, it's not clear to me what type of data you are working with. You
say they are "graphs of sort." There are "principled" ways of working
with graphs in SVMs -- namely using "graph kernels". You can find
information about them if you run through google (Karsten Borgwadt
does a lot of work in this area). Unfortunately, I don't think there
are any public-domain implementations out there for you to consume
easily.

But still -- you're able to calculate a distance metric over your data
-- how are you doing that?

Here's a shot at the dark, and probably not so correct, but read at
your own risk:

What if you try to create a kernel matrix by plugging your distance
metric into the appropriate place from something like an RBF kernel
function. For instance, the value of the RBF kernel between two points
is:

exp(-|X_1 - X_2|^2 / sigma^2)

What if you plugged your distance measure between samples X_1 and X_2
into the |X_1 - X_2| slot and kept the rest the same?

You have to verify that this is a valid kernel (gram) matrix -- I
think it just needs to be symmetric positive definite. See a quick
review here:
http://www.support-vector.net/icml-tutorial.pdf

Now your just left to figure out how to use ksvm (from kernlab) with
kernel matrices and maybe you have something that can work.
#
Hi Steve,
thanks a lot, I will haev a look at the kernel appraoch ,that looks 
promising. I will first have to study the theory behind before I use it, 
I guess.
Cheers
M.
On 10/21/2010 5:42 PM, Steve Lianoglou wrote: