complex search between dataframes
On Jun 2, 2011, at 1:42 PM, Filippo Beleggia wrote:
Hi! I am very new to R, I hope someone can help me. I have two dataframes: data1<-data.frame(from=c(1,12,16,40,55,81,101),to=c(10,13,23,45,67,99,123)) data2<-data.frame(name=c(1,2,3,4,5,6,7,8,9),position=c(2,14,20,50,150,2000,2001,2002,85)) I want to know which of the entries in "position" of data2 are included between any "from" and the corresponding "to" of data1. So in this case I would need to somehow be able to extract 2,20 and 85, corrisponding to the "name"s 1,3 and 9. Thank you very much! Filippo
See ?findInterval Coerce data1 into a matrix, so that the interval boundaries are in increasing order by columns, which is then actually used by findInterval as a vector (eg. c(1, 10, 12, ...)):
t(data1)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] from 1 12 16 40 55 81 101 to 10 13 23 45 67 99 123 findInterval() will return the interval indices for each data2$position value within the sorted intervals. Since your actual intervals are discontinuous, you only want the values that fit in the odd intervals, which is where the use of %in% seq(1, 13, 2) comes in. Prior to that, findInterval() returns:
findInterval(data2$position, t(data1))
[1] 1 4 5 8 14 14 14 14 11 With it:
findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)
[1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE Now you can use the TRUE values to index data2$name:
data2$name[findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)]
[1] 1 3 9 or data2$position:
data2$position[findInterval(data2$position, t(data1)) %in% seq(1, 13, 2)]
[1] 2 20 85 HTH, Marc Schwartz