Hi,
First of all thanks to everybody who came with suggestions and solutions, but Simon came with perfect code ;-)
We work with data that usually is projected in locat UTM coordinates, so i've changed the code as following and it works like a charm:
## code begins
## Euclidian distance
##this has advantage that i can use a z coordinate if i have it and change the dist function accordingly
dist <- function(x1, y1, x2, y2) {
((x1-x2)^2 + (y1-y2)^2)^0.5
}
dist.merge <- function(x, y, xeast, xnorth, yeast, ynorth){
tmp <- t(apply(x[,c(xeast, xnorth)], 1, function(x, y){
dists <- apply(y, 1, function(x, y) dist(x[2],
x[1], y[2], y[1]), x)
cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
}
, y[,c(yeast, ynorth)]))
tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast,
ynorth), names(y))])
row.names(tmp) <- NULL
tmp
}
## code ends
#demo
track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90), v1=letters[1:20], v2=1:20)
dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')
Again,
Thanks for all the help,
Monica
Date: Sun, 14 Sep 2008 23:06:22 +1000
From: sleepingwell at gmail.com
To: pisicandru at hotmail.com
Subject: Re: [R] Join data by minimum distance
CC: r-help at r-project.org
I am wondering if there is a function which will do a join between 2 data.frames by minimum distance, as it is done in ArcGIS for example. For people who are not familiar with ArcGIS here it is an explanation:
Suppose you have a data.frame with x, y, coordinates called track, and a second data frame with different x, y coordinates and some other attributes called classif. The track data.frame has a different number of rows than classif. I want to join the rows from classif to track in such a way that for each row in track I add only the row from classif that has coordinates closest to the coordinates in the track row (and hence minimum distance in between the 2 rows), and also add a new column which will record this minimum distance. Even if the coordinates in the 2 data.frames have same name, the values are not identical between the data.frames, so a merge by column is not what I am after.
#-----------------------------------------------------------------------
# get the distance between two points on the globe.
#
# args:
# lat1 - latitude of first point.
# long1 - longitude of first point.
# lat2 - latitude of first point.
# long2 - longitude of first point.
# radius - average radius of the earth in km
#
# see: http://en.wikipedia.org/wiki/Great_circle_distance
#-----------------------------------------------------------------------
greatCircleDistance <- function(lat1, long1, lat2, long2, radius=6372.795){
sf <- pi/180
lat1 <- lat1*sf
lat2 <- lat2*sf
long1 <- long1*sf
long2 <- long2*sf
lod <- abs(long1-long2)
radius * atan2(
sqrt((cos(lat1)*sin(lod))**2 +
(cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2),
sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod)
)
}
#-----------------------------------------------------------------------
# Calculate the nearest point using latitude and longitude.
# and attach the other args and nearest distance from the
# other data.frame.
#
# args:
# x as you describe 'track'
# y as you describe 'classif'
# xlongnme name of longitude variable in x
# xlatnme name of latitude location variable in x
# ylongnme name of longitude location variable on y
# ylatnme name of latitude location variable on y
#-----------------------------------------------------------------------
dist.merge <- function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){
tmp <- t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){
dists <- apply(y, 1, function(x, y) greatCircleDistance(x[2],
x[1], y[2], y[1]), x)
cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
}
, y[,c(ylongnme, ylatnme)]))
tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme,
ylatnme), names(y))])
row.names(tmp) <- NULL
tmp
}
# demo
track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90),
v1=letters[1:20], v2=1:20)
dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')