Skip to content
Prev 42785 / 398506 Next

Using pam, agnes or clara as prediction models?

On Thu, 15 Jan 2004, Renald Buter wrote:

            
You created a clustering of the training set, yet interpreted it against
the clustering of the whole set using the now irrelevant statement

`the upper half of the ruspini data contains only 2 clusters'

which applies to the wrong clustering.  I pointed out that the training 
set does not contain a single member of one of _those_ clusters so you are 
bound to get a completely different clustering.

When you divided a dataset into `training' and `testing' sets you are 
assuming an least exchangeability whereas this dataset is clearly ordered.
So it is not credible that `train' and `test' are samples from the same 
population.