General binary search?
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Stavros Macrakis Sent: Monday, April 04, 2011 1:15 PM To: r-help Subject: [R] General binary search? Is there a generic binary search routine in a standard library which a) works for character vectors b) runs in O(log(N)) time? I'm aware of findInterval(x,vec), but it is restricted to numeric vectors.
xtfrm(x) will convert a character (or other) vector to
a numeric vector with the same ordering. findInterval
can work on that. E.g.,
> f0 <- function(x, vec) {
tmp <- xtfrm(c(x, vec))
findInterval(tmp[seq_along(x)], tmp[-seq_along(x)])
}
> f0(c("Baby", "Aunt", "Dog"), LETTERS)
[1] 2 1 4
I've never looked at its speed.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
I'm also aware of various hashing solutions (e.g. new.env(hash=TRUE) and fastmatch), but I need the greatest-lower-bound match in my application. findInterval is also slow for large N=length(vec) because of the O(N) checking it does, as Duncan Murdoch has pointed out<https://stat.ethz.ch/pipermail/r-help/2008-September/174584.html>: though its documentation says it runs in O(n * log(N)), it actually runs in O(n * log(N) + N), which is quite noticeable for largish N. But that is easy enough to work around by writing a variant of findInterval which calls find_interv_vec without checking. -s PS Yes, binary search is a one-liner in R, but I always prefer to use standard, fast native libraries when possible.... binarysearch <- function(val,tab,L,H) {while (H>=L) { M=L+(H-L) %/% 2; if (tab[M]>val) H<-M-1 else if (tab[M]<val) L<-M+1 else return(M)}; return(L-1)} [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.