Skip to content
Prev 5708 / 29559 Next

Classification of attribute table

Hi Wesley,

So you just want to partition the 1916 cases into three clusters. This
is a clustering problem rather than a discriminant analysis oriented
classification problem. As a result, Dylan Beaudette's suggestion of
using the clara() function is pretty reasonable, but your data set isn't
so large that other (more computationally intensive) algorithms can't be
used (assuming you have a machine with a reasonable amount of memory in
it). Moreover, some of your measures are very highly correlated with one
another (var and stdev for instance), so you can probably reduce the
number of variables used in the clustering.

Is the 1916 cases fixed, or will you want to take new cases and then
assign them to one of the three clusters created using the original
1916? If this is the case, using model based clustering might make the
most sense since you have a clean way of assigning new cases to the
existing clusters based on the posterior probability of cluster
membership.

Dan
On Mon, 2009-05-11 at 07:44 -0700, Dylan Beaudette wrote: