CoDA: Clustering Multiple Data Sets
On Fri, 10 Oct 2014, separent at yahoo.com wrote:
It is not clear whether you need a supervised or an unsupervised model. Clustering is unsupervised: it will classify compositions in hierarchical groups regardless the label (countries, regions). If this is what you intend, you might compute the clustering (hclust) on an euclidean distance matrix (vegdist) performed across the clr- or ilr-transformed data (both return the same distances). If you mean a supervised approach, you might want to explain how groups differ, and/or predict to which group the composition belongs. To explain, discriminant analysis (packages MASS or ade4) is (arguably) often a good choice. To predict a category, you might look at machine learning techniques (see caret package among many others).
Serge-?tienne, It would be an unsupervised model. But, more importantly, you let me see outside the rut into which I wandered. Categorizing streams based on functional composition, then classifying new streams based on those categories has not been a completely satisfying solution, and after mailing my message yesterday I decided to look at a better paradigm. There's a reason why clustering across multiple compositional data sets has not been commonly used in the literature I've read. Time to step back and examine various multivariate regression approaches; the intended use of these compositional data is to explain water quality based on the biota present. Thanks for your valuable inputs. Carpe weekend, Rich