complete linkage Agglomerative hierarchical clustering, nnclust, spatclus or something else?
On Wed, 21 Apr 2010, Hans Ekbrand wrote:
On Tue, Apr 20, 2010 at 11:13:22PM +0200, Hans Ekbrand wrote:
Roger Bivand wrote:
On Tue, 20 Apr 2010, Hans Ekbrand wrote:
I have just read about clustering on wikipedia, and learnt that what I want is: Agglomerative hierarchical clustering, with complete linkage
library(cluster) ?hclust
print(load(url("http://sociologi.cjb.net/temp/clust.geo.test.RData")))
clust.geo.test.tree <- hclust(dist(clust.geo.test at coords))
clust.geo.test.tree$height
head(clust.geo.test.tree$height, 70)
[1] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[11] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[21] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[31] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[41] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[51] 0.000000 0.000000 0.000000 0.000000 3.160631 18.963676 30.398644 32.232351 37.927539 44.987446
[61] 50.065192 81.542472 82.691738 93.553729 95.971207 105.325405 115.218371 119.540239 125.235381 130.181302
As I understand this, the 54 zeroes represent identical coordinates.
The positive numbers represent the distance in meters between points
that have been grouped together at a certain level of the tree. Now, I
am not interested in grouping together points with distances larger
than 100 meters, so I would like to stop the clustering process at
that point - or, after the hclust has completed, extract the clusters
that were in effect at that level. In the above example that would be
at level 65.
I didn't understand from the documentation of hclust how to accomplish
that, can someone on the list help me?
So you do not want hclust at all, really. Look at dnearneigh() in spdep, setting a 100m bound. Then use n.comp.nb() to see which points belong to which graph component, using perhaps plot.nb with colours to distinguish the subgraphs. Roger
The goal is to count, for each cluster, the number of fires and then to analyse how the fires within each cluster is distributed over time, and to count how many of them that are too close in time to be considered independent.
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no