complete linkage Agglomerative hierarchical clustering, nnclust, spatclus or something else?
You asked earlier about nnclust: it does single-linkage rather than complete-linkage clustering, that is, it defines clusters so that each point in the cluster has a nearest neighbour in the cluster closer than the threshold distance. This produces much less circular clusters than complete-linkage clustering.
The main distinctive feature of nnclust is that it is feasible even for quite large data sets, taking linear space and roughly nlogn time.
-thomas
On Wed, 21 Apr 2010, Hans Ekbrand wrote:
On Wed, Apr 21, 2010 at 03:14:46PM +0200, Roger Bivand wrote:
On Wed, 21 Apr 2010, Hans Ekbrand wrote:
[...]
Well, hclust was useful, once I understood how cutree works. What would be the benefit of dnearneigh(), is it faster?
For larger data sets, hclust needs a triangular distance matrix, dnearneigh does not. Finding graph components in the output "nb" object also seems conceptually more direct.
OK, good to know if I run into trouble when using the code on larger data-sets later on.
Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle