Finding overlaps in vector
Thank you very much for this elegant solution to the problem. The reason I still hope for an extension of Jim's code (not the one re responded with in this thread, but the one I actually reference) is that windows of overlap can be asymetric with that: one can check e.g. whether values overlap given the constraints that the closest allowed proximity 'down' is 0.5 and 'up' is 0.75. I would highly cherish a solution that would allow for cluster isolation with that requirement. Thanks for your time and insight, Joh
Gabor Grothendieck wrote:
This may not be as direct as Jim's in terms of specifying granularity but will uses conventional hierarchical clustering to create the clusters and also draws a nice dendrogram for you. I have split the dendrogram at a height of 0.5 to define the clusters but you can change that to whatever granularity you like:
v <- c(0, 0.45, 1, 2, 3, 3.25, 3.33, 3.75, 4.1, 5, 6, 6.45, 7, 7.1, 8) # cluster and plot hc <- hclust(dist(v), method = "single") plot(hc, lab = v) cl <- rect.hclust(hc, h = .5, border = "red") # each component of list cl is one cluster. Print them out. for(idx in cl) print(unname(v[idx]))
[1] 8 [1] 7.0 7.1 [1] 6.00 6.45 [1] 5 [1] 3.00 3.25 3.33 3.75 4.10 [1] 2 [1] 1 [1] 0.00 0.45
# a different representation of the clusters vv <- v names(vv) <- ct <- cutree(hc, h = .5) vv
1 1 2 3 4 4 4 4 4 5 6 6 7 7 8 0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00 7.10 8.00 On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de> wrote:
<posted & mailed> Dear all, I'm trying to solve the problem, of how to find clusters of values in a vector that are closer than a given value. Illustrated this might look as follows: vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) When using '0.5' as the proximity requirement, the following groups would result: 0,0.45 3,3.25,3.33,3.75,4.1 6,6.45 7,7.1 Jim Holtman proposed a very elegant solution in http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have modified and perused since he wrote it to me. The beauty of this approach is that it will not only work for constant proximity requirements as above, but also for overlap-windows defined in terms of ppm around each value. Now I have an additional need and have found no way (short of iteratively step through all the groups returned) to figure out how to do that with Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate clusters? Thanks for any hints, Joh
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.