Ward's Clustering Doubts
Hi Rodrigo, [apropos of Ward's method]
... we saw something like "You must use it with Euclidean Distance..."
Strictly speaking this is probably correct, as Ward's method does an analysis of variance type of decomposition and so doesn't really make much sense (I think) unless Euclidean distance (i.e. least-squares) is used. However, there may be ways around this. First, because a distance metric is non-Euclidean does not mean that it is always non-Euclidean. You can test this using ?is.euclid in package ade4. You can also test your matrix by doing a principal co-ordinate analysis; then look for negative eigenvalues. If none are found, the matrix is Euclidean and it should be OK to use Ward's method on that data set. Probably a better approach is to make your distance matrix Euclidean. There are several functions in ade4 that will do this. The idea then is to present/compare the two solutions: the first using the uncorrected, non-Euclidean distance matrix, the second using the corrected version. You could use procrustes/co-inertia analysis to compare the two in an intermediate step. Regards, Mark.
Rodrigo Aluizio wrote:
Hi Everybody, Now I have a doubt that is more statistical than R's technical. I?m working with ecology of recent Foraminifera. At the lab we used to perform cluster analysis using 1-Pearson?s R and Wards method (we already saw it in bibliography of the area) which renders good results with our biological data. Recently, using ?R? Software (vegan and Cluster packages) which allows the combination of any kind of distances matrix with any clustering method, we tried to used Bray Curtis + Wards (which seem to be more appropriate to a matrix with a lot of zeros) and it renders a better result. Furthermore, the results agree with our hypothesis and with the results we have got with the Distance-based Redundancy Analysis - dbRDA or CAP. It means, the analysis (Q-mode) clusters the stations according to the main physical, sedimentary and biological characteristics of the study area. We received some critical comments noticing that Wards Method accepts Euclidean Distance only. So, we made the analysis again using Euclidean Distance but we don?t get the better results we had using 1-Pearson?s R + Wards or Bray Curtis + Wards (actually any other distance + method combination rendered better results). Trying to find answers in the specialized literature we just got little more confused because in any moment we saw something like "You must use it with Euclidean Distance" and like I said above we already saw in some articles from respected journals, other kind of distance associated with the Ward's Clustering method. Is it wrong or is it ?non sense? to do the analysis in the way we were doing? The results with Wards combined with 1-Pearson?s R or Bray Curtis fit better with our hypothesis and have excellent agglomerative coefficients , but we don?t want to make inappropriate statistical procedures. I'm starting to realize how powerful R is, but it doesn't justify doing nonsense statistics... I hope one of you may help us! Thank you in advance. Rodrigo. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/Ward%27s-Clustering-Doubts-tp19486028p19490991.html Sent from the R help mailing list archive at Nabble.com.