Finding overlaps in vector
Jim, Although I can't find the post this code stems from, I had come across it on my prowling the NG. It's not the one you had shared with me to eliminate overlaps (and which I referenced below: http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html). That particular solution you had come up with marked entries as overlapping or not, and I am looking for an extension to that code which would also return the actual "clusters" of consecutively overlapping values. While Gabor's code in this thread does what I require for the example I still hope somebody more cluefull than myself can extent your code since it carries the - for me - significant advantage of being able to build the windows of overlap with different values for 'up' and 'down', let's say check which values overlap when the overlap-defining distance is 5ppm 'up' and 7.5ppm 'down' from each value. This is a generalization I would highly cherish. Thanks for your help and continuous patience on r-help. Joh
jim holtman wrote:
Here is a modification of the algorithm to use a specified value for the overlap:
vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) # following add 0.5 as the overlap detection -- can be changed x <- rbind(cbind(value=vector, oper=1, id=seq_along(vector)),
+ cbind(value=vector+0.5, oper=-1, id=seq_along(vector)))
x <- x[order(x[,'value'], -x[, 'oper']),] # determine which ones overlap x <- cbind(x, over=cumsum(x[, 'oper'])) # now partition into groups and only use groups greater than or equal to # 3 determine where the breaks are (0 values in cumsum(over)) x <- cbind(x, breaks=cumsum(x[, 'over'] == 0)) # delete entries with 'over' == 0 x <- x[x[, 'over'] != 0,] # split into groupd x.groups <- split(x[, 'id'], x[, 'breaks']) # only keep those with more than 2 x.subsets <- x.groups[sapply(x.groups, length) >= 3] # print out the subsets invisible(lapply(x.subsets, function(a) print(vector[unique(a)])))
[1] 0.00 0.45 [1] 3.00 3.25 3.33 3.75 4.10 [1] 6.00 6.45 [1] 7.0 7.1 On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de> wrote:
<posted & mailed> Dear all, I'm trying to solve the problem, of how to find clusters of values in a vector that are closer than a given value. Illustrated this might look as follows: vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) When using '0.5' as the proximity requirement, the following groups would result: 0,0.45 3,3.25,3.33,3.75,4.1 6,6.45 7,7.1 Jim Holtman proposed a very elegant solution in http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have modified and perused since he wrote it to me. The beauty of this approach is that it will not only work for constant proximity requirements as above, but also for overlap-windows defined in terms of ppm around each value. Now I have an additional need and have found no way (short of iteratively step through all the groups returned) to figure out how to do that with Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate clusters? Thanks for any hints, Joh
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.