Hi, I'm facing this problem quite a lot, so it seems worthwhile to check to see what the most efficient solution is. I've two vectors x (values ordered) and y. I've ranges x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn and want to construct a subvector yprime of y which consists of the first/last value of y whose x values are in the range. For example, x y 1 2 1 3 2 3 3 4 4 5 5 6 and let's say the ranges are 1 <= x < 3 and 3 <= x < 5. I should produce yprime as c( 2, 4 ) (if I ask for the first value of y whose x is in the range). [If there're no x values within a given range, output an NA.] Obviously I can do a loop and use which, etc., but it seems like there should be a better way. Thanks very much. A general solution would be nice, but if it helps to make the algorithm efficient, I'm happy to assume (a) x values are ordered (b) the ranges are always evenly spaced: for example, x in 0 to 10, 10 to 20, 20 to 30, etc.
Efficient subsetting
2 messages · R A F, Jerome Asselin
Here I have a general solution. x need not be ordered and ranges need not be equally spaced. x <- c(1,1,2,3,4,5) y <- c(2,3,3,4,5,6) xcut <- cut(x,breaks=c(1,3,5),right=F) #If you want the FIRST value of y whose x are in the range wh <- !duplicated(xcut) & !is.na(xcut) y[wh] # [1] 2 4 #If you want the LAST value of y whose x are in the range revxcut <- rev(xcut) wh <- rev(!duplicated(revxcut) & !is.na(revxcut)) y[wh] # [1] 3 5 HTH, Jerome
On May 16, 2003 11:16 am, R A F wrote:
Content-Length: 1109 Status: R X-Status: N Hi, I'm facing this problem quite a lot, so it seems worthwhile to check to see what the most efficient solution is. I've two vectors x (values ordered) and y. I've ranges x < x0, x0 <= x < x1, x1 <= x < x2, x2 <= x < x3, x > xn and want to construct a subvector yprime of y which consists of the first/last value of y whose x values are in the range. For example, x y 1 2 1 3 2 3 3 4 4 5 5 6 and let's say the ranges are 1 <= x < 3 and 3 <= x < 5. I should produce yprime as c( 2, 4 ) (if I ask for the first value of y whose x is in the range). [If there're no x values within a given range, output an NA.] Obviously I can do a loop and use which, etc., but it seems like there should be a better way. Thanks very much. A general solution would be nice, but if it helps to make the algorithm efficient, I'm happy to assume (a) x values are ordered (b) the ranges are always evenly spaced: for example, x in 0 to 10, 10 to 20, 20 to 30, etc.
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Jerome Asselin (J?r?me), Statistical Analyst British Columbia Centre for Excellence in HIV/AIDS St. Paul's Hospital, 608 - 1081 Burrard Street Vancouver, British Columbia, CANADA V6Z 1Y6 Email: jerome at hivnet.ubc.ca Phone: 604 806-9112 Fax: 604 806-9044