Skip to content

Integer division

12 messages · Göran Broström, Bert Gunter, Jeff Newmiller +3 more

#
I have a long vector x with five-digit codes where the first digit of 
each is of special interest, so I extracted them through

 > y <- x %/% 10000

but to my surprise y contained the value -1 in some places. It turned 
out that x contains -1 as a symbol for 'missing value' so in effect I 
found that

 > -1 %/% 10000 == -1

Had to check the help page for "%/%", and the first relevant comment I 
found was:

"Users are sometimes surprised by the value returned".

No surprise there. Further down:

?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? indicates
      integer division.  It is guaranteed that

      ? x == (x %% y) + y * (x %/% y) ? (up to rounding error)

I did expect  (a %/% b) to return round(a / b), like gfortran and gcc, 
but instead I get floor(a / b) in R. What is the reason for these 
different definitions? And shouldn't R's definition be documented?

Thanks, G?ran
#
> I have a long vector x with five-digit codes where the
    > first digit of each is of special interest, so I extracted
    > them through

    >> y <- x %/% 10000

    > but to my surprise y contained the value -1 in some
    > places. It turned out that x contains -1 as a symbol for
    > 'missing value' so in effect I found that

    >> -1 %/% 10000 == -1

    > Had to check the help page for "%/%", and the first
    > relevant comment I found was:

    > "Users are sometimes surprised by the value returned".

    > No surprise there. Further down:

    > ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%?
    > indicates integer division.  It is guaranteed that

    >       ? x == (x %% y) + y * (x %/% y) ? (up to rounding
    > error)

    > I did expect (a %/% b) to return round(a / b), like
    > gfortran and gcc, 

What???  I cannot believe you.

No time for checking now, but I bet that 
8 / 3  gives 2 and not 3  in C and Fortran
(and hence gcc, etc) 


    > but instead I get floor(a / b) in
    > R. What is the reason for these different definitions? And
    > shouldn't R's definition be documented?



    > Thanks, G?ran

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
    > more, see https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
    > http://www.R-project.org/posting-guide.html and provide
    > commented, minimal, self-contained, reproducible code.
#
Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
Well, you shouldn't, I generalized too far.
But compare -8 %/% 3 in R and -8 / 3 in C/Fortran.

G,
#
If R does exactly what it says it does, why are you surprised, whether or
not that is what other languages do?  (Please excuse my fractured English).
(Note: -8 = 1 +  3*(-3)  =  (-8 %% 3)  + 3 * (-8 %/% 3 ), exactly as the
Help excerpt you cited says)

Of course, one may always question the wisdom of whatever policy R chooses,
but that gets into the bowels of what standards to follow, corner cases,
and all that, about which I have nothing to say. My only  point is the
above.



-- Bert
On Mon, Dec 19, 2022 at 7:15 AM G?ran Brostr?m <gb at ehar.se> wrote:

            

  
  
#
See https://en.m.wikipedia.org/wiki/Modulo_operation, Variants of the definition, esp the point that Knuth recommended the floor definition. The behavior of %/% follows from the definition of %% given the documented relation in ?Arithmetic.

R is not obligated to repeat the mistakes of C or Fortran.
On December 19, 2022 7:15:01 AM PST, "G?ran Brostr?m" <gb at ehar.se> wrote:

  
    
#
> See https://en.m.wikipedia.org/wiki/Modulo_operation,
    > Variants of the definition, esp the point that Knuth
    > recommended the floor definition. The behavior of %/%
    > follows from the definition of %% given the documented
    > relation in ?Arithmetic.

    > R is not obligated to repeat the mistakes of C or Fortran.

Fortune nomination!  ==> BCC:  maintainer("fortunes")

The Wikipedia page is indeed revealing amazing facts about how
differently this has been implemented in computer languages.

Then, after all,  G?ran still got a point to make here, given that
not anymore are all R users equipped with a Ph.D in math or equivalent..:

It would probably be helpful to add a short paragraph to  ?Arithmetic
about the fact that R's %% uses the "floored" version, as
recommended by Donald Knuth and as documented on the above
Wikipedia page.

Martin

    > On December 19, 2022 7:15:01 AM PST, "G?ran Brostr?m"
> <gb at ehar.se> wrote:
>> 
    >> 
    >> Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
    >>>>>>>> G?ran Brostr?m on Mon, 19 Dec 2022 14:22:00 +0100
    >>>>>>>> writes:
    >>> 
    >>> > I have a long vector x with five-digit codes where the
    >>> > first digit of each is of special interest, so I
    >>> extracted > them through
    >>> 
    >>> >> y <- x %/% 10000
    >>> 
    >>> > but to my surprise y contained the value -1 in some >
    >>> places. It turned out that x contains -1 as a symbol for
    >>> > 'missing value' so in effect I found that
    >>> 
    >>> >> -1 %/% 10000 == -1
    >>> 
    >>> > Had to check the help page for "%/%", and the first >
    >>> relevant comment I found was:
    >>> 
    >>> > "Users are sometimes surprised by the value returned".
    >>> 
    >>> > No surprise there. Further down:
    >>> 
    >>> > ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? >
    >>> indicates integer division.  It is guaranteed that
    >>> 
    >>> > ? x == (x %% y) + y * (x %/% y) ? (up to rounding >
    >>> error)
    >>> 
    >>> > I did expect (a %/% b) to return round(a / b), like >
    >>> gfortran and gcc,
    >>> 
    >>> What???  I cannot believe you.
    >> 
    >> Well, you shouldn't, I generalized too far.
    >>> 
    >>> No time for checking now, but I bet that 8 / 3 gives 2
    >>> and not 3 in C and Fortran (and hence gcc, etc)
    >> 
    >> But compare -8 %/% 3 in R and -8 / 3 in C/Fortran.
    >> 
    >> G,
    >> 
    >>> 
    >>> 
    >>> > but instead I get floor(a / b) in > R. What is the
    >>> reason for these different definitions? And > shouldn't
    >>> R's definition be documented?
    >>> 
    >>> 
    >>> 
    >>> > Thanks, G?ran
    >>> 
    >>> > ______________________________________________ >
    >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
    >>> > more, see https://stat.ethz.ch/mailman/listinfo/r-help
    >>> > PLEASE do read the posting guide >
    >>> http://www.R-project.org/posting-guide.html and provide
    >>> > commented, minimal, self-contained, reproducible code.
    >> 
    >> ______________________________________________
    >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
    >> more, see https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide
    >> http://www.R-project.org/posting-guide.html and provide
    >> commented, minimal, self-contained, reproducible code.

    > -- 
    > Sent from my phone. Please excuse my brevity.
#
"It would probably be helpful to add a short paragraph to  ?Arithmetic
about the fact that R's %% uses the "floored" version, as
recommended by Donald Knuth and as documented on the above
Wikipedia page."

Agreed (who am I to disagree?!).

But perhaps something simpler like:

"a %% b must always be between 0 (inclusive) and b(exclusive)."

ergo, for examples,
[1] 2  ## 5 = 1*3 + 2
[1] -1 ## 5 = 2*(-3) + (-1)
[1] 1  ## -5 = (-2) * 3 + 1
[1] -2  ## -5 = 1* (-3) + (-2)

My point is that I believe it is easier for most non-math folks to groc the
relationship through a simple rule and such examples than via an algorithm.

Feel free to disagree, as this is just my opinion.

Cheers,
Bert




On Mon, Dec 19, 2022 at 9:30 AM Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

  
  
#
The Fortran '08 standard says <<
One operand of type integer may be divided by another operand of type
integer. Although the mathematical
quotient of two integers is not necessarily an integer, Table 7.2 specifies
that an expression involving the division
operator with two operands of type integer is interpreted as an expression
of type integer. The result of such an
operation is the integer closest to the mathematical quotient and between
zero and the mathematical quotient
inclusively. >>
Another way to say this is that integer division in
Fortran TRUNCATES towards zero.  It does not round and
never has.

C carefully left the behaviour of integer division (/)
unspecified, but introduced the div(,) function with the
same effect as Fortran (/).  Later versions of the C
standard tightened this up, and the C17 standard reads <<
The result of the / operator is the quotient from the division of the first
operand by the second; the
result of the % operator is the remainder. In both operations, if the value
of the second operand is
zero, the behavior is undefined.
When integers are divided, the result of the / operator is the algebraic
quotient with any fractional
part discarded. 107) If the quotient a/b is representable, the expression
(a/b)*b + a%b shall equal a ;
otherwise, the behavior of both a/b and a%b is undefined.>>

That is, C17 TRUNCATES the result of division towards
zero.  I don't know of any C compiler that rounds,
certainly gcc does not.


The Java 15 Language Specification says
<< Integer division rounds toward 0. >>
which also specified truncating division.


The help for ?"%/%" does not say what the result is.
Or if it does, I can't find it.  Either way, this is
a defect in the documentation.  It needs to be spelled
out very clearly.
R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
[1] -3  2
so we deduce that R *neither* rounds *not* truncates,
but returns the floor of the quotient.
It is widely argued that flooring division is more
generally useful than rounding or truncating division,
but it is admittedly surprising.
On Tue, 20 Dec 2022 at 02:51, G?ran Brostr?m <gb at ehar.se> wrote:

            

  
  
#
Thanks Richard,

the "rounding claim" was my mistake (as I replied to Martin), I should 
said "truncates toward zero" as you explain.

However, my point was that these two mathematical functions should be 
defined in the documentation, as you also say. And I was surprised that 
there is no consensus regarding the definition of such elementary functions.

G?ran
On 2022-12-20 03:01, Richard O'Keefe wrote:
#
Documentation specifics aside, and I am not convinced that is an issue here, there is a responsibility on programmers on how to use routines like this by testing small samples and seeing if the results match expectations.

Since negative numbers were possible, that would have been part of such tests.

And there are many ways to do things and the method chosen does not strike me as a particularly great method of finding out about the first digit unless you are guaranteed to have exactly five digits. It may be efficient but it can likely fail in many cases where the data is not as expected such as more than five digits or not containing a number.

So many programmers would first filter the data to check for various conditions. Some might simply try to convert the numbers into character strings (perhaps the absolute value of the number instead) and look at the first character instead, or handle it differently if it is a minus sign. 

Many programming languages contain families of functions for some tasks when there are many possible ways to do something that can get somewhat different results. There is absolutely NO reason to assume that any one member of a family of functions will do what you expect and you may need to either explore others that are similar or make your own.

Something as simple as this might give you what you want:

first_of_five <- function(numb) abs(numb) %/% 10000





-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of G?ran Brostr?m
Sent: Tuesday, December 20, 2022 1:53 AM
To: Richard O'Keefe <raoknz at gmail.com>
Cc: r-help at r-project.org
Subject: Re: [R] Integer division

Thanks Richard,

the "rounding claim" was my mistake (as I replied to Martin), I should said "truncates toward zero" as you explain.

However, my point was that these two mathematical functions should be defined in the documentation, as you also say. And I was surprised that there is no consensus regarding the definition of such elementary functions.

G?ran
On 2022-12-20 03:01, Richard O'Keefe wrote:
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Lack of consensus:
  I should mention Python's // operator, which does
  flooring division.
  I should mention Common Lisp, where (floor - -),
  (ceiling - -), (round - -), and (truncate - -)
  all return a quotient and appropriate remainder.
  I should mention Smalltalk, where // and \\ are
  flooring quotient and remainder and quo: and rem:
  are truncating quotient and remainder.
  I should give dishonourable mention to certain
  programming languages where the quotient and remainder
  operators do not actually fit together.

Why the lack of consensus:
  It starts with the fact that there wasn't an agreed
  *mathematical* definition.  Number theorists, as a
  rule, don't care about negative numbers all that much.
  To the extent that they do care, x mod y has to go
  around in neat cycles, which flooring division does
  satisfy and truncating division does not.

  It then goes on to early computers which used sign-and-
  magnitude or ones-complement representation.  In those
  computers, truncating division was the *obvious* thing
  to do.  It also had the nice property that
  n / (2**k) was the same thing as an arithmetic right
  shift by k bits.  And then twos-complement became
  popular.  And not only is the twos-complement range
  asymmetric (so that x might be representable but -x not)
  but arithmetic right shifts aren't the same as truncating
  division any more.  Whoops!

  And then, although flooring division still made sense for
  twos-complement but truncating division didn't really,
  new programming languages kept on specifying truncating
  division because the programming languages of the 1960s
  for the hardware of the 1960s did so.  So new hardware
  designers supported the new programming languages without
  supporting the *reasons* why truncating division had been
  used.
[1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

Think about for example histogramming a collection of
integers by their remainders and what would happen with
truncating remainder.
On Tue, 20 Dec 2022 at 19:53, G?ran Brostr?m <gb at ehar.se> wrote:

            

  
  
#
> Lack of consensus: I should mention Python's // operator,
    > which does flooring division.  I should mention Common
    > Lisp, where (floor - -), (ceiling - -), (round - -), and
    > (truncate - -) all return a quotient and appropriate
    > remainder.  I should mention Smalltalk, where // and \\
    > are flooring quotient and remainder and quo: and rem: are
    > truncating quotient and remainder.  I should give
    > dishonourable mention to certain programming languages
    > where the quotient and remainder operators do not actually
    > fit together.

    > Why the lack of consensus: It starts with the fact that
    > there wasn't an agreed *mathematical* definition.  Number
    > theorists, as a rule, don't care about negative numbers
    > all that much.  To the extent that they do care, x mod y
    > has to go around in neat cycles, which flooring division
    > does satisfy and truncating division does not.

    >   It then goes on to early computers which used sign-and-
    > magnitude or ones-complement representation.  In those
    > computers, truncating division was the *obvious* thing to
    > do.  It also had the nice property that n / (2**k) was the
    > same thing as an arithmetic right shift by k bits.  And
    > then twos-complement became popular.  And not only is the
    > twos-complement range asymmetric (so that x might be
    > representable but -x not) but arithmetic right shifts
    > aren't the same as truncating division any more.  Whoops!

    >   And then, although flooring division still made sense
    > for twos-complement but truncating division didn't really,
    > new programming languages kept on specifying truncating
    > division because the programming languages of the 1960s
    > for the hardware of the 1960s did so.  So new hardware
    > designers supported the new programming languages without
    > supporting the *reasons* why truncating division had been
    > used.

    >> (-8:7)%%4
    >  [1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

    > Think about for example histogramming a collection of
    > integers by their remainders and what would happen with
    > truncating remainder.

    > On Tue, 20 Dec 2022 at 19:53, G?ran Brostr?m <gb at ehar.se>
> wrote:
>> Thanks Richard,
    >> 
    >> the "rounding claim" was my mistake (as I replied to
    >> Martin), I should said "truncates toward zero" as you
    >> explain.
    >> 
    >> However, my point was that these two mathematical
    >> functions should be defined in the documentation, as you
    >> also say. And I was surprised that there is no consensus
    >> regarding the definition of such elementary functions.
    >> 
    >> G?ran

    [...........]

Thank you all for your contributions, notably Richard's last one
providing really interesting historical context (of "why this mess?").

Note that the Wikipedia page
  https://en.m.wikipedia.org/wiki/Modulo_operation

also does mention "Euclidean division" which does have possibly
even nicer mathematical properties than the floored division
R (and quite a few other good softwares) use.

Still, be assured that we won't change R here.  Mathematical
(Algebra) related R packages can easily introduce corresponding
versions of div() and mod() functions, I'd say,  and I'd guess
these would already exist somewhere.

Yesterday, I've updated the  ?Arithmetic help page which now
does mention (more clearly if it was really already derivable
from the previous doc) what happens, also mentioning Knuth and
the Wikipedia page.

--> https://stat.ethz.ch/R-manual/R-devel/library/base/html/Arithmetic.html

(search for  "R-devel R-manual ETH" in your browser, then
 -> 'Packages' -> 'base' ..)

Martin

--
Martin Maechler
ETH Zurich  and  R Core team