Match .3 in a sequence
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch Sent: Tuesday, March 17, 2009 12:15 PM To: Daniel Murphy Cc: r-devel at r-project.org Subject: Re: [Rd] Match .3 in a sequence On 3/17/2009 11:26 AM, Daniel Murphy wrote:
Is this a reasonably fast way to do an approximate match of
a vector x to
values in a list?
match.approx <- function(x,list,tol=.0001)
sapply(apply(abs(outer(list,x,"-"))<tol,2,which),"[",1)
If you are willing to assume that the list values are all multiples of 2*tol, then it's easy: just divide both x and list by 2*tol, round to nearest integer, and use the regular match function. If not, it becomes harder; I'd probably use a solution like yours. Duncan Murdoch
Here are 2 other implentations of that match.approx function
which use much less memory (and are faster) when the length
of 'x' and 'list' are long (>100, say). The first uses
approx(method="const") to figure out which entries in the
list are just below and above each entry in x and the second
uses sorting tricks to do the same thing. Then you only have
to figure out if the closest of those 2 entries is close enough.
The original one above fails when tol>min(diff(sort(list))).
match.approx2 <-
function(x,list,tol=.0001) {
o1 <- rep.int(c(FALSE,TRUE),
c(length(x),length(list)))[order(c(x,list))]
o2 <- rep.int(c(FALSE,TRUE),
c(length(x),length(list)))[order(c(x,list))]
below <- approx(list, list, xout=x, method="constant", f=0)$y
above <- approx(list, list, xout=x, method="constant", f=1)$y
stopifnot(all(below<=x, na.rm=TRUE), all(above>=x, na.rm=TRUE))
closestInList <- ifelse(x-below < above-x, below, above)
closestInList[x<min(list)] <- min(list)
closestInList[x>max(list)] <- max(list)
closestInList[abs(x-closestInList)>tol] <- NA
match(closestInList, list)
}
match.approx3 <-
function(x, list, tol=.0001){
stopifnot(length(list)>0, !any(is.na(x)), !any(is.na(list)))
oox <- order(order(x)) # essentially rank(x)
i <- rep(c(FALSE,TRUE), c(length(x),length(list)))[order(c(x,
list))]
i <- cumsum(i)[!i] + 1L
i[i > length(list)] <- NA
i <- order(list)[i]
leastUpperBound <- i[oox]
i <- rep(c(TRUE,FALSE), c(length(list),length(x)))[order(c(list,
x))]
i <- cumsum(i)[!i]
i[i < 1L] <- NA
i <- order(list)[i]
greatestLowerBound <- i[oox]
closestInList <-
ifelse(is.na(greatestLowerBound),
leastUpperBound, # above max(list)
ifelse(is.na(leastUpperBound),
greatestLowerBound, # below min(list)
ifelse(x-list[greatestLowerBound]<list[leastUpperBound]-x,
greatestLowerBound,
leastUpperBound)))
if (tol<Inf)
closestInList[abs(x - list[closestInList])>tol] <- NA
closestInList
}
Thanks. -Dan On Mon, Mar 16, 2009 at 8:24 AM, Stavros Macrakis
<macrakis at alum.mit.edu>wrote:
Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I
assume you
really mean something like seq(from=.2,to=.3,by=.1), which gives c(0.2, 0.3). %in% tests for exact equality, which is almost never a
good idea with
floating-point numbers.
You need to define what exactly you mean by "in" for floating-point
numbers. What sort of tolerance are you willing to allow?
Some possibilities would be for example:
approxin <- function(x,list,tol) any(abs(list-x)<tol) # absolute
tolerance
rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) ||
any(abs((list-x)/x)<=tol,na.rm=TRUE)
# relative tolerance; only exact 0 will match 0
Hope this helps,
-s
On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy
<chiefmurphy at gmail.com>
wrote:
Hello:I am trying to match the value 0.3 in the sequence
seq(.2,.3). I
get
0.3 %in% seq(from=.2,to=.3)
[1] FALSE Yet
0.3 %in% c(.2,.3)
[1] TRUE For arbitrary sequences, this "invisible .3" has been
problematic. What
is
the best way to work around this?
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel