Finding overlaps in vector

Enlightening. Thanks.

Joh

If you want indexes, i.e. 1, 2, 3, ... instead of the values in v you
can still use split -- just split on seq_along(v) instead of v (or if
v had names you might want to split along names(v)):

split(seq_along(v), ct)

and if you only want to retain groups with 2+ elements then
you can just Filter then out:

twoplus <- function(x) length(x) >= 2
Filter(twoplus, split(seq_along(v), ct))

On Dec 22, 2007 5:12 AM, Johannes Graumann <johannes_graumann at web.de>
wrote:
But cutree does away with the indexes from the original input, which
rect.hclust retains.
I will have no other choice and match that input with the 'values'
contained in the clusters ...

Joh

Gabor Grothendieck wrote:

If we don't need any plotting we don't really need rect.hclust at
all.  Split the output of cutree, instead.  Continuing from the
prior code:

for(el in split(unname(vv), names(vv))) print(el)
[1] 0.00 0.45
[1] 1
[1] 2
[1] 3.00 3.25 3.33 3.75 4.10
[1] 5
[1] 6.00 6.45
[1] 7.0 7.1
[1] 8

On Dec 21, 2007 3:24 PM, Johannes Graumann <johannes_graumann at web.de>
wrote:
Hm, hm, rect.hclust doesn't accept "plot=FALSE" and cutree doesn't
retain the indexes of membership ... anyway short of ripping out the
guts of rect.hclust to achieve the same result without an active
graphics device?

Joh

# cluster and plot
hc <- hclust(dist(v), method = "single")
plot(hc, lab = v)
cl <- rect.hclust(hc, h = .5, border = "red")

# each component of list cl is one cluster.  Print them out.
for(idx in cl) print(unname(v[idx]))
[1] 8
[1] 7.0 7.1
[1] 6.00 6.45
[1] 5
[1] 3.00 3.25 3.33 3.75 4.10
[1] 2
[1] 1
[1] 0.00 0.45

# a different representation of the clusters
vv <- v
names(vv) <- ct <- cutree(hc, h = .5)
vv
   1    1    2    3    4    4    4    4    4    5    6    6    7   
   7
    8
0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00
7.10 8.00

On Dec 21, 2007 4:56 AM, Johannes Graumann
<johannes_graumann at web.de> wrote:
<posted & mailed>

Dear all,

I'm trying to solve the problem, of how to find clusters of values
in a vector that are closer than a given value. Illustrated this
might look as follows:

vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)

When using '0.5' as the proximity requirement, the following groups
would result:
0,0.45
3,3.25,3.33,3.75,4.1
6,6.45
7,7.1

Jim Holtman proposed a very elegant solution in
http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I
have modified and perused since he wrote it to me. The beauty of
this approach is that it will not only work for constant proximity
requirements as above, but also for overlap-windows defined in
terms of ppm around each value. Now I have an additional need and
have found no way (short of iteratively step through all the groups
returned) to figure out how to do that with Jim's approach: how to
figure out that 6,6.45 and 7,7.1 are separate clusters?

Thanks for any hints, Joh

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.

Finding overlaps in vector

Thread (13 messages)