Aggregating points based on distance
On Wed, Mar 13, 2019 at 6:14 PM Andy Bunn <bunna at wwu.edu> wrote:
I would like to create averages of all the variables in a
SpatialPointsDataFrame when points are within a specified distance of each
other. I have a method for doing this but it seems like a silly way to
approach the problem. Any ideas for doing this using modern syntax
(especially of the tidy variety) would be appreciated.
To start, I have a SpatialPointsDataFrame with several variables measured
for each point. I'd like to get an average value for each variable for
points within a specified distance. E.g., getting average cadmium values
from the meuse data for points within 100 m of each other:
library(sf)
library(sp)
data(meuse)
pts <- st_as_sf(meuse, coords = c("x", "y"), remove=FALSE)
pts100 <- st_is_within_distance(pts, dist = 100)
# can use sapply to get mean of a variable. E.g., cadmium
sapply(pts100, function(x){ mean(pts$cadmium[x]) })
If this is the method you call "silly" then I don't see anything silly at
all here, only efficient well-written use of base R constructs. The problem
with "modern" syntax is that its subject to rapid change and often slower
than using base R, which has had years to stabilise and optimise.
If you want to iterate this over variables then nest your sapplys:
items = c("cadmium", "copper","lead")
sapply(items, function(item){
sapply(pts100, function(x){ mean(pts[[item]][x]) })
})
gets you:
cadmium copper lead
[1,] 10.150000 83.00000 288.00000
[2,] 10.150000 83.00000 288.00000
[3,] 6.500000 68.00000 199.00000
[4,] 2.600000 81.00000 116.00000
Barry
Above, I've figured out how to use sapply to do this variable by variable.
So I could, if I wanted, calculate the mean for each variable, generate a
centroid for each point and then a SpatialPointsDataFrame of the unique
values. E.g., for the first few variables:
res <- data.frame(id=1:length(pts100),
x=NA, y=NA,
cadmium=NA, copper=NA, lead=NA)
res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
res2 <- res[duplicated(res$cadmium),]
coordinates(res2) <- c("x","y")
bubble(res2,"cadmium")
This works but seems cumbersome and like there must be a more efficient
way.
Thanks for any help, Andy
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo