Skip to content
Prev 131292 / 398502 Next

Information criteria for kmeans

This is not primarily an R question: if you tell us how you want to define 
it, we may be able to help you compute it.  I presume you are talking 
about Schwarz (1978), which is not billed as an 'information criterion'.

AFAIK, all Gideon Schwarz did was to define a criterion for linear 
regression.  People have applied it to other situations with a vector 
space of parameters.  However in many clustering methods (including 
kmeans, and as for example in classification trees) there is also a 
combinatorial part of the fit: you optimize over both the cluster centres 
and the allocation of units to clusters.  It does not come close to the 
Schwarz framework.

Nor does clustering fit into Akaike (1973, 1974)'s information framework.

There is discussion in Banfield & Raftery (1993) of a Schwarz-like 
criterion for clustering, but with a rather different derivation and I 
don't think it should be attributed to Schwarz.
On Wed, 5 Dec 2007, Serguei Kaniovski wrote: