a==0 vs as.integer(a)==0 vs all.equal(a,0)
On Tue, 8 Mar 2005 09:03:43 +0000, Robin Hankin <r.hankin at soc.soton.ac.uk> wrote :
hi
?integer says:
Note that on almost all implementations of R the range of
representable integers is restricted to about +/-2*10^9: 'double's
can hold much larger integers exactly.
I am getting very confused as to when to use integers and when not to.
In my line
I need exact comparisons of large integer-valued arrays, so I often use
as.integer(),
but the above seems to tell me that doubles might be better.
Consider the following R idiom of Euclid's algorithm for the highest
common factor
of two positive integers:
gcd <- function(a, b){
if (b == 0){ return(a)}
return(Recall(b, a%%b))
}
If I call this with gcd(10,12), for example, then a%%b is not an
integer, so the first
line of the function, testing b for being zero, isn't legitimate.
When you say it isn't legitimate, you mean that it violates the advice never to use exact comparison on floating point values? I think that's just advice, it's not a hard and fast rule. If you happen to know that the values being compared have been calculated and stored exactly, then "==" is valid. In your function, when a and b are integers that are within some range (I'm not sure what it is, but it approaches +/- 2^53), the %% operator should return exact results. (Does it do so on all platforms? I'm not sure, but I'd call it a bug if it didn't unless a and/or b were very close to the upper limit of exactly representable integers.) Do you know of examples where a and b are integers stored in floating point, and a %% b returns a value that is different from as.integer(a) %% as.integer(b)?
OK, so I have some options:
(1) stick in "a <- as.integer(a), b <- as.integer(b)" into the
function: then a%%b *will* be an
integer and the "==" test is appropriate
(2) use some test like abs(b) < TOL for some suitable TOL (0.5?)
(3) use identical(all.equal(b,0),TRUE) like it says in identical.Rd
(4) use identical(all.equal(b,as.integer(0)),TRUE)
I'd suggest
(5) Use your gcd function almost as above, but modified to work on
vectors:
gcd <- function(a, b){
result <- a
nonzero <- b != 0
if (any(nonzero))
result[nonzero] <- Recall(b[nonzero], a[nonzero] %% b[nonzero])
return(result)
}
How does the List deal with this kind of problem? Also, gcd() as written returns a non-integer. Would the List recommend rewriting the last line as return(as.integer(Recall(b,a%%b))) or not?
I'd say not. Your original function returns integer when both a and b are stored as integers, and double when at least one of them is not. That seems like reasonable behaviour to me. Duncan Murdoch