Several of the methods I use for analyzing large data sets, such as
WinGamma: determining the level of noise in data
Relief-F: estimating the influence of variables
depend on finding the k nearest neighbors of a point in a data frame or
matrix efficiently. (For large data sets it is not feasible to compute
the 'dist' matrix anyway.)
Seeing the proposed solution to "[R] distance between two matrices"
last month for finding _one_ nearest neighbor I came up with a solution
'nearest(A, n, k)' as appended.
Still, this is very clumsy and slow if you have to find the 3 nearest
neighbors for 1000 points in a data frame with 100000 entries at least
-- about 2 secs per data point on my computer or half an hour for an
application from real life.
Are there better/faster ways to perform this task using R functions?
Even better, is there a free implementation of kd-trees that I could
utilize (the one I found did not conform to the ANSI C standard)?
Someome pointed to 'spdep' of the R-Sig-Geo project, but 'knearneigh'
only accepts 2D data points.
Hans Werner Borchers
ABB Corporate Research, Germany
________________________________________________________________________
require (class)
nearest <- function (X, n, k=3)
## Find k nearest neighbors of X[n, ] in the data frame
## or matrix X, utilizing function knn1 k-times.
{
N <- nrow(X)
# inds contains the indices of nearest neighbors
inds <- c(n); i <- 0
while (i < k) {
# use knn1 for one index...
j <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
# ...and change to true index of neighbor
inds <- c(inds, setdiff(1:N, inds)[j])
i <- i+1
}
# return nearest neighbor indices (without n, of course)
return(inds[-1])
}
Nearest Neighbor Algorithm in R -- again.
2 messages · Hans W Borchers, Brian Ripley
Why not modify the C code of knn? (Not knn1)
On Mon, 2 Feb 2004, Hans W Borchers wrote:
Several of the methods I use for analyzing large data sets, such as WinGamma: determining the level of noise in data Relief-F: estimating the influence of variables depend on finding the k nearest neighbors of a point in a data frame or matrix efficiently. (For large data sets it is not feasible to compute the 'dist' matrix anyway.) Seeing the proposed solution to "[R] distance between two matrices" last month for finding _one_ nearest neighbor I came up with a solution 'nearest(A, n, k)' as appended. Still, this is very clumsy and slow if you have to find the 3 nearest neighbors for 1000 points in a data frame with 100000 entries at least -- about 2 secs per data point on my computer or half an hour for an application from real life. Are there better/faster ways to perform this task using R functions? Even better, is there a free implementation of kd-trees that I could utilize (the one I found did not conform to the ANSI C standard)? Someome pointed to 'spdep' of the R-Sig-Geo project, but 'knearneigh' only accepts 2D data points. Hans Werner Borchers ABB Corporate Research, Germany
________________________________________________________________________
require (class)
nearest <- function (X, n, k=3)
## Find k nearest neighbors of X[n, ] in the data frame
## or matrix X, utilizing function knn1 k-times.
{
N <- nrow(X)
# inds contains the indices of nearest neighbors
inds <- c(n); i <- 0
while (i < k) {
# use knn1 for one index...
j <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
# ...and change to true index of neighbor
inds <- c(inds, setdiff(1:N, inds)[j])
i <- i+1
}
# return nearest neighbor indices (without n, of course)
return(inds[-1])
}
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595