Skip to content
Prev 42758 / 398502 Next

model-based clustering

Hello Murray,

thanks for the response. I would actually love to hear alternative
suggestions about the problem I am trying to solve. I just thought a
short question will be less of a burden on people's time and have a
higher chance of being answered.

basically the data sets I need to analyze contain 2000-10000 objects.
each characterized by, depending on the data set, 9-20 attributes. all
integers greater than zero, typically the range is [0,1000] with numbers
< 5 particularly common. there is no apriori reason why these objects
should cluster into discrete groups. and in fact when the data is
explored graphically (xgobi) it doesn't show an obvious clustering
pattern. however, with 9-20 dimensions involved, it is probably easy to
miss subtle patterns. I have tried clustering the data using a number of
standard approaches including hclust,kmeans,fanny etc. but these methods
didn't seem to be able to generate convincingly distinct, homogeneous
clusters. of course given the type of the data involved Poisson mixtures
seem like the natural choice.

I have experimented a bit with snob using contrived data sets (where you
know which class objects really belong to) and it has been fairly
promising, except maybe for snob's tendency to break the known classes
into multiple subclasses. 

I actually would like to try to code this in R. It would be very helpful
to me in fact if you can contribute any code/code fragments/examples
from your earlier work on this, either to the list or privately.

many thanks
Murad
maj at stats.waikato.ac.nz wrote: