Skip to content
Prev 325579 / 398503 Next

find closest value in a vector based on another vector values

That method could be written as the following function
f0 <- function (a, b, unique = TRUE) 
{
    ret <- a[sapply(b, function(x) which.min(abs(x - a)))]
    if (unique) { 
        ret <- unique(ret)
    }
    ret
}

If 'a' is in sorted order then I think the following, based on findInterval,
does the same thing in less time, especially when 'b' is longish.
If 'a' may not be sorted then add

f1 <- function (a, b, unique = TRUE) 
{
    leftI <- findInterval(b, a)
    rightI <- leftI + 1
    leftI[leftI == 0] <- 1
    rightI[rightI > length(a)] <- length(a)
    ret <- ifelse(abs(b - a[leftI]) < abs(b - a[rightI]), a[leftI],  a[rightI])
    if (unique) { 
        ret <- unique(ret)
    }
    ret
}

E.g.,

R> a <- sort(rnorm(1e6))
R> b <- sort(rnorm(1000))
R> system.time(r0 <- f0(a, b))
   user  system elapsed 
   4.88    3.48    8.36 
R> system.time(r1 <- f1(a, b))
   user  system elapsed 
      0       0       0 
R> identical(r0, r1)
[1] TRUE

If 'a' might be unsorted then add
    if (is.unsorted(a))  a <- sort(a)
at the beginning.  If the output must be in the same order as the original
'a' then use order(a) and subscript 'a' and 'ret' with its output.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com