a==0 vs as.integer(a)==0 vs all.equal(a,0) - R-help

Tue, Mar 8, 2005 1:03 AM #

hi


?integer says:

      Note that on almost all implementations of R the range of
      representable integers is restricted to about +/-2*10^9: 'double's
      can hold much larger integers exactly.


I am getting very confused as to when to use integers and when not to.  
In my line
I need exact comparisons of large integer-valued arrays, so I often use 
as.integer(),
but the above seems to tell me that doubles might  be better.

Consider the following R idiom of Euclid's algorithm for the highest 
common factor
of two positive integers:

   gcd <- function(a, b){
     if (b == 0){ return(a)}
     return(Recall(b, a%%b))
   }

If I call this with gcd(10,12), for example, then  a%%b is not an 
integer, so the first
line of the function, testing b for being zero, isn't legitimate.

OK, so I have some options:

(1) stick in "a <- as.integer(a),  b <- as.integer(b)" into the 
function:  then a%%b *will* be an
                integer and the "==" test is appropriate
(2) use some test like abs(b) < TOL for some suitable TOL (0.5?)
(3) use identical(all.equal(b,0),TRUE) like it says in identical.Rd
(4) use identical(all.equal(b,as.integer(0)),TRUE)

How does the List deal with this kind of problem?

Also, gcd() as written returns a non-integer.  Would the List recommend 
rewriting the last
line as

return(as.integer(Recall(b,a%%b)))

or not?


--
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743

Peter Dalgaard

Tue, Mar 8, 2005 1:45 AM #

Robin Hankin <r.hankin at soc.soton.ac.uk> writes:

Not if you want things to work in the large-integer domain...

You're in somewhat murky waters here because it all has to do with
whether you can rely on the floating point aritmetic being exact for
integers up to 2^53. *If* that works, then there's really no reason to
distrust "==" in this context and the gcd() works as originally
written. You might consider wrapping it in a function that checks
whether a and b are both (1) in range and (2) that they are integers
in the sense that round(x)==x. (Failing 2, you likely get an infinite
recursion).

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Duncan Murdoch

Tue, Mar 8, 2005 1:56 AM #

On Tue, 8 Mar 2005 09:03:43 +0000, Robin Hankin
<r.hankin at soc.soton.ac.uk> wrote :

When you say it isn't legitimate, you mean that it violates the advice
never to use exact comparison on floating point values?

I think that's just advice, it's not a hard and fast rule.  If you
happen to know that the values being compared have been calculated and
stored exactly, then "==" is valid.  In your function, when a and b
are integers that are within some range (I'm not sure what it is, but
it approaches +/- 2^53), the %% operator should return exact results.
(Does it do so on all platforms?  I'm not sure, but I'd call it a bug
if it didn't unless a and/or b were very close to the upper limit of
exactly representable integers.)

Do you know of examples where a and b are integers stored in floating
point, and a %% b returns a value that is different from as.integer(a)
%% as.integer(b)?

I'd suggest

(5) Use your gcd function almost as above, but modified to work on
vectors:

   gcd <- function(a, b){
     result <- a
     nonzero <- b != 0
     if (any(nonzero))
       result[nonzero] <- Recall(b[nonzero], a[nonzero] %% b[nonzero])
     return(result)
   }

I'd say not.  Your original function returns integer when both a and b
are stored as integers, and double when at least one of them is not.
That seems like reasonable behaviour to me.

Duncan Murdoch

Brian Ripley

Tue, Mar 8, 2005 2:42 AM #

On Tue, 8 Mar 2005, Duncan Murdoch wrote:

It is supposed to do so up to (but not including)
.Machine$double.base ^ .Machine$double.digits,
normally 2^53, irrespective of sign.  (These are computed at run-time, 
so one can be pretty confident about them, at least if your FPU is 
bug-free.)

Yes (see the NEWS for R-devel), but only for large integers where the 
second is NA.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595