Skip to content

Efficient subsetting

2 messages · R A F, Jerome Asselin

#
Hi, I'm facing this problem quite a lot, so it seems worthwhile
to check to see what the most efficient solution is.

I've two vectors x (values ordered) and y.  I've ranges
x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn
and want to construct a subvector yprime of y which consists
of the first/last value of y whose x values are in the range.

For example,

x   y
1   2
1   3
2   3
3   4
4   5
5   6

and let's say the ranges are 1 <= x < 3 and 3 <= x < 5.  I
should produce yprime as c( 2, 4 ) (if I ask for the first value
of y whose x is in the range).  [If there're no x values within
a given range, output an NA.]

Obviously I can do a loop and use which, etc., but it seems
like there should be a better way.

Thanks very much.

A general solution would be nice, but if it helps to make the
algorithm efficient, I'm happy to assume

(a) x values are ordered
(b) the ranges are always evenly spaced:  for example, x in
0 to 10, 10 to 20, 20 to 30, etc.
#
Here I have a general solution. x need not be ordered and ranges need not 
be equally spaced.

x <- c(1,1,2,3,4,5)
y <- c(2,3,3,4,5,6)
xcut <- cut(x,breaks=c(1,3,5),right=F)

#If you want the FIRST value of y whose x are in the range
wh <- !duplicated(xcut) & !is.na(xcut)
y[wh]         #   [1] 2 4

#If you want the LAST value of y whose x are in the range
revxcut <- rev(xcut)
wh <- rev(!duplicated(revxcut) & !is.na(revxcut))
y[wh]         #   [1] 3 5

HTH,
Jerome
On May 16, 2003 11:16 am, R A F wrote: