An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070210/9b7aacb0/attachment.pl
Near function?
5 messages · Bart Joosen, Dieter Menne, jim holtman +1 more
Bart Joosen <bartjoosen <at> hotmail.com> writes:
Hi, I have an integer which is extracted from a dataframe, which is sorted by
another column of the dataframe.
Now I would like to remove some elements of the integer, which are near to
others by their value. For example:
integer: c(1,20,2,21) should be c(1,20).
....
Sorting the integer is not an option, the order is important.
Why not? It's extremely efficient for large series and the only method that would work with large array. The idea: Keep the indexes of the sort order, mark the "near others" for example making their index NA, and restore original order. No for-loop needed. Dieter
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070210/e7b2f42f/attachment.pl
1 day later
Dear Bart, "hclust" might be useful for this as well: dat = c(1,20,2,21) hc = hclust(dist(dat)) thresh = 2 ct = cutree(hc, h=thresh) clusteredNumbers = split(dat, ct) firstOne = dat[!duplicated(ct)] > clusteredNumbers $`1` [1] 1 2 $`2` [1] 20 21 > firstOne [1] 1 20 Best wishes Wolfgang
I have an integer which is extracted from a dataframe, which is sorted by another column of the dataframe.
Now I would like to remove some elements of the integer, which are near to others by their value. For example: integer: c(1,20,2,21) should be c(1,20).
I tried to write a function, but for some reason, somethings won't work
x <- 1:20
near <- function(x,th) {
nr <- NROW(x)
for (i in 1:(nr-1)){
for (j in (i+1):nr){
if (j > nr) break
t=0
if (abs(x[i] - x[j]) < th) t = 1
if (t== 1) x <- x[-j]
if (t== 1) nr <- nr-1
if (t== 1) j <- (j-1)
cat (" i",i," j",j,"\n")
}}
x
}
near(x,10)
This gives you 1 3 7 13 17 while I was suspecting 1, 20 as the outcome.
If you look at the intermediate results of the cat instruction, you see that, after he substracted a number, he skipped the next one.
Sorting the integer is not an option, the order is important.
I used an integer from 1:20 as an example, while x <- sample((1:20),20) is maybe a bit more representable for our data, but isn't reproducible for the output of the function.
Maybe there is already an R-function, which does such thing, or what is wrong with my coding?
thanks a lot for your time
Bart
[[alternative HTML version deleted]]
______________________________________________ R-help a stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070211/2c60b1f6/attachment.pl