# cluster and plot
hc <- hclust(dist(v), method = "single")
plot(hc, lab = v)
cl <- rect.hclust(hc, h = .5, border = "red")
# each component of list cl is one cluster. Print them out.
for(idx in cl) print(unname(v[idx]))
[1] 8
[1] 7.0 7.1
[1] 6.00 6.45
[1] 5
[1] 3.00 3.25 3.33 3.75 4.10
[1] 2
[1] 1
[1] 0.00 0.45
# a different representation of the clusters
vv <- v
names(vv) <- ct <- cutree(hc, h = .5)
vv
1 1 2 3 4 4 4 4 4 5 6 6 7 7 8
0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00 7.10 8.00
On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de>
wrote:
<posted & mailed>
Dear all,
I'm trying to solve the problem, of how to find clusters of values in a
vector that are closer than a given value. Illustrated this might look as
follows:
vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
When using '0.5' as the proximity requirement, the following groups would
result:
0,0.45
3,3.25,3.33,3.75,4.1
6,6.45
7,7.1
Jim Holtman proposed a very elegant solution in
http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
modified and perused since he wrote it to me. The beauty of this approach
is that it will not only work for constant proximity requirements as
above, but also for overlap-windows defined in terms of ppm around each
value. Now I have an additional need and have found no way (short of
iteratively step through all the groups returned) to figure out how to do
that with Jim's approach: how to figure out that 6,6.45 and 7,7.1 are
separate clusters?
Thanks for any hints, Joh