Aggregating points based on distance

I would like to create averages of all the variables in a
SpatialPointsDataFrame when points are within a specified distance of each
other. I have a method for doing this but it seems like a silly way to
approach the problem. Any ideas for doing this using modern syntax
(especially of the tidy variety) would be appreciated.

To start, I have a SpatialPointsDataFrame with several variables measured
for each point. I'd like to get an average value for each variable for
points within a specified distance. E.g., getting average cadmium values
from the meuse data for points within 100 m of each other:

    library(sf)
    library(sp)
    data(meuse)
    pts <- st_as_sf(meuse, coords = c("x", "y"), remove=FALSE)
    pts100 <- st_is_within_distance(pts, dist = 100)
    # can use sapply to get mean of a variable. E.g., cadmium
    sapply(pts100, function(x){ mean(pts$cadmium[x]) })

If this is the method you call "silly" then I don't see anything silly at
all here, only efficient well-written use of base R constructs. The problem
with "modern" syntax is that its subject to rapid change and often slower
than using base R, which has had years to stabilise and optimise.

If you want to iterate this over variables then nest your sapplys:

items = c("cadmium", "copper","lead")
sapply(items, function(item){
 sapply(pts100, function(x){ mean(pts[[item]][x]) })
})

gets you:

         cadmium    copper      lead
  [1,] 10.150000  83.00000 288.00000
  [2,] 10.150000  83.00000 288.00000
  [3,]  6.500000  68.00000 199.00000
  [4,]  2.600000  81.00000 116.00000

Barry
Above, I've figured out how to use sapply to do this variable by variable.
So I could, if I wanted, calculate the mean for each variable, generate a
centroid for each point and then a SpatialPointsDataFrame of the unique
values. E.g., for the first few variables:

    res <- data.frame(id=1:length(pts100),
                      x=NA, y=NA,
                      cadmium=NA, copper=NA, lead=NA)
    res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
    res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
    res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
    res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
    res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
    res2 <- res[duplicated(res$cadmium),]
    coordinates(res2) <- c("x","y")
    bubble(res2,"cadmium")

This works but seems cumbersome and like there must be a more efficient
way.

Thanks for any help, Andy

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Aggregating points based on distance

Thread (6 messages)