On Thu, Jan 15, 2004 at 08:32:45AM +0000, Prof Brian Ripley wrote:
On Thu, 15 Jan 2004, Renald Buter wrote:
On Wed, Jan 14, 2004 at 03:18:10PM -0500, Liaw, Andy wrote:
If pam produces the cluster medoids, you should be able to use the
1-nearest-neighbor classifier for prediction of future data, using the
medoids as the `training' data. 1-NN is available in the `class' package,
part of the `VR' bundle.
Thanks very much for your quick answer! I've tried your suggestion in
the following way:
# separate the ruspini data into train and test set
> train<-ruspini[1:50,]
> test<-ruspini[51:75,]
> pamx<-pam(train,4)
> knnx<-knn(pamx$medoids,test,factor(c("a","b","c","d")),k=3)
> knnx
[1] d d b b d c b c c d c a a d c c a a c a a d c d a
Levels: a b c d
But the result of applying the test set to the knn should only contain 2
clusters, since the upper half of the ruspini data contains only 2
clusters.
Could you tell me what I am missing here?
You asked that the upper half be divided into 4 clusters. Did you look at
the object pamx? It contains 4 clusters covering only the first part of
the dataset.
Yes, that what was I understood. My objective was to use this division
by applying it to the test set: for each point in the test set, predict
what cluster it would enter.
Given that when you apply pam to the whole dataset there is a cluster that
only occurs for objects 61:75, there is no way you can find that cluster
when no member of it is in your training set.
By isn't that what the knn does: locate the nearest neighbour of a point
and assigning its (nn) label to the point-to-be-classified?
I thought that I was doing:
1. create a clustering of data using PAM
2. train a knn with the medoids of the PAM clustering
3. apply the knn to the test set
4. look at the result
Could you tell me what I'm not getting here?