Skip to content
Prev 44946 / 398528 Next

Distance and Aggregate Data - Again...

dsheuman at rogers.com wrote:

            
Well, there are ways to speed this up.
But you don't need to subset d[, 2:3] every time, e.g.  Also, my
experience is that writing a single line appended to a file every time
around the loop is very inefficient.
Well, that part is easy, essentially, you have to save the indices, then
extract the relevant rows of pop, dwell and age. So I would modify your
code as follows:

library(fields)
d <- as.matrix( read.csv("filein.csv") )

lonlat2 <- d[,2:3]
results <- matrix(0, nrow=nrow(d), ncol=4)
for(i in 1:nrow(d)){
        lonlat1 <- d[i, 2:3]
        whichdist <- which(rdist.earth(t(as.matrix(lonlat1)),
                as.matrix(lonlat2), miles=F) < 2)
        distval <- d[, 1][whichdist]
        sumpop <- sum(data[, "pop"][whichdist])
        sumdwell <- sum(data[, "dwell"][whichdist])
        meanage <- mean(data[, "age"][whichdist])
        results[i, ] <- c(d[i, "recid"], sumpop, sumdwell, meanage)
}
write(t(results), file="C:\\outfile.out", ncol=ncol(results))

Then there is a trick you can use to speed up the process.  Essentially
you reduce the 'inner loop' which is inside rdist.earth().  This is
achieved by initially computing a single distance vector with reference
to a fixed point [(0, 0) seems reasonable since it is in the Atlantic
Ocean].  Then you select a subset of your points that are the same
distance is your circle centre from the fixed point plus or minus the
radius you want (and a bit of tolerance).  This will generate an
annulus of points rather than a circle, but in particular all the
points within the circle of interest will also be within this annulus.
Then you apply rdist.earth() to find the distance of each of these
points from your circle centre.

This is what I have used to process a 52000 length matrix in about 40
minutes (and ~50MB memory):

DistAgg <-
function(data, dist=2) {
  results <- matrix(0, nrow=nrow(data), ncol=4)
  colnames(results) <- c("recid", "sumpop", "sumdwell", "meanage")
  lonlat2 <- data[, 2:3]
  basedist <- rdist.earth(t(as.matrix(c(0, 0))), as.matrix(lonlat2))
  for(i in 1:nrow(data)){
    lonlat1 <- data[i, 2:3]
    approxval <- which(abs(basedist - basedist[i]) < dist*1.001)
    if (length(approxval) > 1) {
      whichval <- approxval[which(rdist.earth(t(as.matrix(lonlat1)),
        as.matrix(lonlat2[approxval, ]), miles=F) < dist)]
    } else {
      whichval <- approxval
    }
    sumpop <- sum(data[, "pop"][whichval])
    sumdwell <- sum(data[, "dwell"][whichval])
    meanage <- mean(data[, "age"][whichval])
    results[i, ] <- c(data[i, "recid"], sumpop, sumdwell, meanage)
  }
  write(t(results), file="C:\\outfile.out", ncol=ncol(results))
}

The data I used was based on yours, but randomised latitude and
longitude (in restricted ranges).

Hope this helps,
Ray Brownrigg