Skip to content
Prev 156193 / 398506 Next

Spatial join ? optimizing code

Hi Monica,

I think the key to speeding this up is, for every point in 'track', to
compute the distance to all points in 'classif' 'simultaneously',
using vectorized calculations. Here's my function. On my laptop it's
about 160 times faster than the original for the case I looked at
(10,000 observations in track and 500 in classif). I get around 18
seconds for the 30,000 and 4,000 example (2 GHz processor running
linux).

Dan

dist.merge2 <- function(x, y, xeast, xnorth, yeast, ynorth) {
    ## construct data frame d in which d[i,] contains information                                                              
    ## associated with the closest point in y to x[i,]                                                                         
    xpos <- as.matrix(x[,c(xeast, xnorth)])
    xposl <- lapply(seq.int(nrow(x)), function(i) xpos[i,])
    ypos <- t(as.matrix(y[,c(yeast, ynorth)]))
    yinfo <- y[,! colnames(y) %in% c(yeast,ynorth)]

    get.match.and.dist <- function(point) {
        sqdists <- colSums((point - ypos)^2)
        ind <- which.min(sqdists)
        c(ind, sqrt(sqdists[ind]))
    }
    match <- sapply(xposl, get.match.and.dist)
    cbind(xpos, mindist=match[2,], yinfo[match[1,],])
}

It's marginally faster to convert xpos to a list followed by sapply as
I do here, than to leave it as a matrix and use apply to get the
matches.
On Tue, Sep 16, 2008 at 04:23:33PM +0000, Monica Pisica wrote: