I have a long vector x with five-digit codes where the first digit of
each is of special interest, so I extracted them through
> y <- x %/% 10000
but to my surprise y contained the value -1 in some places. It turned
out that x contains -1 as a symbol for 'missing value' so in effect I
found that
> -1 %/% 10000 == -1
Had to check the help page for "%/%", and the first relevant comment I
found was:
"Users are sometimes surprised by the value returned".
No surprise there. Further down:
?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? indicates
integer division. It is guaranteed that
? x == (x %% y) + y * (x %/% y) ? (up to rounding error)
I did expect (a %/% b) to return round(a / b), like gfortran and gcc,
but instead I get floor(a / b) in R. What is the reason for these
different definitions? And shouldn't R's definition be documented?
Thanks, G?ran
Integer division
12 messages · Göran Broström, Bert Gunter, Jeff Newmiller +3 more
G?ran Brostr?m
on Mon, 19 Dec 2022 14:22:00 +0100 writes:
> I have a long vector x with five-digit codes where the
> first digit of each is of special interest, so I extracted
> them through
>> y <- x %/% 10000
> but to my surprise y contained the value -1 in some
> places. It turned out that x contains -1 as a symbol for
> 'missing value' so in effect I found that
>> -1 %/% 10000 == -1
> Had to check the help page for "%/%", and the first
> relevant comment I found was:
> "Users are sometimes surprised by the value returned".
> No surprise there. Further down:
> ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%?
> indicates integer division. It is guaranteed that
> ? x == (x %% y) + y * (x %/% y) ? (up to rounding
> error)
> I did expect (a %/% b) to return round(a / b), like
> gfortran and gcc,
What??? I cannot believe you.
No time for checking now, but I bet that
8 / 3 gives 2 and not 3 in C and Fortran
(and hence gcc, etc)
> but instead I get floor(a / b) in
> R. What is the reason for these different definitions? And
> shouldn't R's definition be documented?
> Thanks, G?ran
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
G?ran Brostr?m
on Mon, 19 Dec 2022 14:22:00 +0100 writes:
> I have a long vector x with five-digit codes where the
> first digit of each is of special interest, so I extracted
> them through
>> y <- x %/% 10000
> but to my surprise y contained the value -1 in some
> places. It turned out that x contains -1 as a symbol for
> 'missing value' so in effect I found that
>> -1 %/% 10000 == -1
> Had to check the help page for "%/%", and the first
> relevant comment I found was:
> "Users are sometimes surprised by the value returned".
> No surprise there. Further down:
> ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%?
> indicates integer division. It is guaranteed that
> ? x == (x %% y) + y * (x %/% y) ? (up to rounding
> error)
> I did expect (a %/% b) to return round(a / b), like
> gfortran and gcc,
What??? I cannot believe you.
Well, you shouldn't, I generalized too far.
No time for checking now, but I bet that 8 / 3 gives 2 and not 3 in C and Fortran (and hence gcc, etc)
But compare -8 %/% 3 in R and -8 / 3 in C/Fortran. G,
> but instead I get floor(a / b) in
> R. What is the reason for these different definitions? And
> shouldn't R's definition be documented?
> Thanks, G?ran
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
If R does exactly what it says it does, why are you surprised, whether or not that is what other languages do? (Please excuse my fractured English). (Note: -8 = 1 + 3*(-3) = (-8 %% 3) + 3 * (-8 %/% 3 ), exactly as the Help excerpt you cited says) Of course, one may always question the wisdom of whatever policy R chooses, but that gets into the bowels of what standards to follow, corner cases, and all that, about which I have nothing to say. My only point is the above. -- Bert
On Mon, Dec 19, 2022 at 7:15 AM G?ran Brostr?m <gb at ehar.se> wrote:
Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
G?ran Brostr?m
on Mon, 19 Dec 2022 14:22:00 +0100 writes:
> I have a long vector x with five-digit codes where the
> first digit of each is of special interest, so I extracted
> them through
>> y <- x %/% 10000
> but to my surprise y contained the value -1 in some
> places. It turned out that x contains -1 as a symbol for
> 'missing value' so in effect I found that
>> -1 %/% 10000 == -1
> Had to check the help page for "%/%", and the first
> relevant comment I found was:
> "Users are sometimes surprised by the value returned".
> No surprise there. Further down:
> ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%?
> indicates integer division. It is guaranteed that
> ? x == (x %% y) + y * (x %/% y) ? (up to rounding
> error)
> I did expect (a %/% b) to return round(a / b), like
> gfortran and gcc,
What??? I cannot believe you.
Well, you shouldn't, I generalized too far.
No time for checking now, but I bet that 8 / 3 gives 2 and not 3 in C and Fortran (and hence gcc, etc)
But compare -8 %/% 3 in R and -8 / 3 in C/Fortran. G,
> but instead I get floor(a / b) in
> R. What is the reason for these different definitions? And
> shouldn't R's definition be documented?
> Thanks, G?ran
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
See https://en.m.wikipedia.org/wiki/Modulo_operation, Variants of the definition, esp the point that Knuth recommended the floor definition. The behavior of %/% follows from the definition of %% given the documented relation in ?Arithmetic. R is not obligated to repeat the mistakes of C or Fortran.
On December 19, 2022 7:15:01 AM PST, "G?ran Brostr?m" <gb at ehar.se> wrote:
Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
G?ran Brostr?m
on Mon, 19 Dec 2022 14:22:00 +0100 writes:
> I have a long vector x with five-digit codes where the
> first digit of each is of special interest, so I extracted
> them through
>> y <- x %/% 10000
> but to my surprise y contained the value -1 in some
> places. It turned out that x contains -1 as a symbol for
> 'missing value' so in effect I found that
>> -1 %/% 10000 == -1
> Had to check the help page for "%/%", and the first
> relevant comment I found was:
> "Users are sometimes surprised by the value returned".
> No surprise there. Further down:
> ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%?
> indicates integer division. It is guaranteed that
> ? x == (x %% y) + y * (x %/% y) ? (up to rounding
> error)
> I did expect (a %/% b) to return round(a / b), like
> gfortran and gcc,
What??? I cannot believe you.
Well, you shouldn't, I generalized too far.
No time for checking now, but I bet that 8 / 3 gives 2 and not 3 in C and Fortran (and hence gcc, etc)
But compare -8 %/% 3 in R and -8 / 3 in C/Fortran. G,
> but instead I get floor(a / b) in
> R. What is the reason for these different definitions? And
> shouldn't R's definition be documented?
> Thanks, G?ran
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
Jeff Newmiller
on Mon, 19 Dec 2022 08:37:32 -0800 writes:
> See https://en.m.wikipedia.org/wiki/Modulo_operation, > Variants of the definition, esp the point that Knuth > recommended the floor definition. The behavior of %/% > follows from the definition of %% given the documented > relation in ?Arithmetic. > R is not obligated to repeat the mistakes of C or Fortran. Fortune nomination! ==> BCC: maintainer("fortunes") The Wikipedia page is indeed revealing amazing facts about how differently this has been implemented in computer languages. Then, after all, G?ran still got a point to make here, given that not anymore are all R users equipped with a Ph.D in math or equivalent..: It would probably be helpful to add a short paragraph to ?Arithmetic about the fact that R's %% uses the "floored" version, as recommended by Donald Knuth and as documented on the above Wikipedia page. Martin > On December 19, 2022 7:15:01 AM PST, "G?ran Brostr?m"
> <gb at ehar.se> wrote:
>>
>>
>> Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
>>>>>>>> G?ran Brostr?m on Mon, 19 Dec 2022 14:22:00 +0100
>>>>>>>> writes:
>>>
>>> > I have a long vector x with five-digit codes where the
>>> > first digit of each is of special interest, so I
>>> extracted > them through
>>>
>>> >> y <- x %/% 10000
>>>
>>> > but to my surprise y contained the value -1 in some >
>>> places. It turned out that x contains -1 as a symbol for
>>> > 'missing value' so in effect I found that
>>>
>>> >> -1 %/% 10000 == -1
>>>
>>> > Had to check the help page for "%/%", and the first >
>>> relevant comment I found was:
>>>
>>> > "Users are sometimes surprised by the value returned".
>>>
>>> > No surprise there. Further down:
>>>
>>> > ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? >
>>> indicates integer division. It is guaranteed that
>>>
>>> > ? x == (x %% y) + y * (x %/% y) ? (up to rounding >
>>> error)
>>>
>>> > I did expect (a %/% b) to return round(a / b), like >
>>> gfortran and gcc,
>>>
>>> What??? I cannot believe you.
>>
>> Well, you shouldn't, I generalized too far.
>>>
>>> No time for checking now, but I bet that 8 / 3 gives 2
>>> and not 3 in C and Fortran (and hence gcc, etc)
>>
>> But compare -8 %/% 3 in R and -8 / 3 in C/Fortran.
>>
>> G,
>>
>>>
>>>
>>> > but instead I get floor(a / b) in > R. What is the
>>> reason for these different definitions? And > shouldn't
>>> R's definition be documented?
>>>
>>>
>>>
>>> > Thanks, G?ran
>>>
>>> > ______________________________________________ >
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>>> > more, see https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide >
>>> http://www.R-project.org/posting-guide.html and provide
>>> > commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
> --
> Sent from my phone. Please excuse my brevity.
"It would probably be helpful to add a short paragraph to ?Arithmetic about the fact that R's %% uses the "floored" version, as recommended by Donald Knuth and as documented on the above Wikipedia page." Agreed (who am I to disagree?!). But perhaps something simpler like: "a %% b must always be between 0 (inclusive) and b(exclusive)." ergo, for examples,
5 %% 3
[1] 2 ## 5 = 1*3 + 2
5 %% -3
[1] -1 ## 5 = 2*(-3) + (-1)
-5 %% 3
[1] 1 ## -5 = (-2) * 3 + 1
-5 %% -3
[1] -2 ## -5 = 1* (-3) + (-2) My point is that I believe it is easier for most non-math folks to groc the relationship through a simple rule and such examples than via an algorithm. Feel free to disagree, as this is just my opinion. Cheers, Bert On Mon, Dec 19, 2022 at 9:30 AM Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Jeff Newmiller
on Mon, 19 Dec 2022 08:37:32 -0800 writes:
> See https://en.m.wikipedia.org/wiki/Modulo_operation, > Variants of the definition, esp the point that Knuth > recommended the floor definition. The behavior of %/% > follows from the definition of %% given the documented > relation in ?Arithmetic.
> R is not obligated to repeat the mistakes of C or Fortran.
Fortune nomination! ==> BCC: maintainer("fortunes")
The Wikipedia page is indeed revealing amazing facts about how
differently this has been implemented in computer languages.
Then, after all, G?ran still got a point to make here, given that
not anymore are all R users equipped with a Ph.D in math or equivalent..:
It would probably be helpful to add a short paragraph to ?Arithmetic
about the fact that R's %% uses the "floored" version, as
recommended by Donald Knuth and as documented on the above
Wikipedia page.
Martin
> On December 19, 2022 7:15:01 AM PST, "G?ran Brostr?m"
> <gb at ehar.se> wrote:
>>
>>
>> Den 2022-12-19 kl. 15:41, skrev Martin Maechler:
>>>>>>>> G?ran Brostr?m on Mon, 19 Dec 2022 14:22:00 +0100
>>>>>>>> writes:
>>>
>>> > I have a long vector x with five-digit codes where the
>>> > first digit of each is of special interest, so I
>>> extracted > them through
>>>
>>> >> y <- x %/% 10000
>>>
>>> > but to my surprise y contained the value -1 in some >
>>> places. It turned out that x contains -1 as a symbol for
>>> > 'missing value' so in effect I found that
>>>
>>> >> -1 %/% 10000 == -1
>>>
>>> > Had to check the help page for "%/%", and the first >
>>> relevant comment I found was:
>>>
>>> > "Users are sometimes surprised by the value returned".
>>>
>>> > No surprise there. Further down:
>>>
>>> > ?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? >
>>> indicates integer division. It is guaranteed that
>>>
>>> > ? x == (x %% y) + y * (x %/% y) ? (up to rounding >
>>> error)
>>>
>>> > I did expect (a %/% b) to return round(a / b), like >
>>> gfortran and gcc,
>>>
>>> What??? I cannot believe you.
>>
>> Well, you shouldn't, I generalized too far.
>>>
>>> No time for checking now, but I bet that 8 / 3 gives 2
>>> and not 3 in C and Fortran (and hence gcc, etc)
>>
>> But compare -8 %/% 3 in R and -8 / 3 in C/Fortran.
>>
>> G,
>>
>>>
>>>
>>> > but instead I get floor(a / b) in > R. What is the
>>> reason for these different definitions? And > shouldn't
>>> R's definition be documented?
>>>
>>>
>>>
>>> > Thanks, G?ran
>>>
>>> > ______________________________________________ >
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>>> > more, see https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide >
>>> http://www.R-project.org/posting-guide.html and provide
>>> > commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
> --
> Sent from my phone. Please excuse my brevity.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The Fortran '08 standard says << One operand of type integer may be divided by another operand of type integer. Although the mathematical quotient of two integers is not necessarily an integer, Table 7.2 specifies that an expression involving the division operator with two operands of type integer is interpreted as an expression of type integer. The result of such an operation is the integer closest to the mathematical quotient and between zero and the mathematical quotient inclusively. >> Another way to say this is that integer division in Fortran TRUNCATES towards zero. It does not round and never has. C carefully left the behaviour of integer division (/) unspecified, but introduced the div(,) function with the same effect as Fortran (/). Later versions of the C standard tightened this up, and the C17 standard reads << The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined. When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded. 107) If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a ; otherwise, the behavior of both a/b and a%b is undefined.>> That is, C17 TRUNCATES the result of division towards zero. I don't know of any C compiler that rounds, certainly gcc does not. The Java 15 Language Specification says << Integer division rounds toward 0. >> which also specified truncating division. The help for ?"%/%" does not say what the result is. Or if it does, I can't find it. Either way, this is a defect in the documentation. It needs to be spelled out very clearly. R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
c(-8,8) %/% 3
[1] -3 2 so we deduce that R *neither* rounds *not* truncates, but returns the floor of the quotient. It is widely argued that flooring division is more generally useful than rounding or truncating division, but it is admittedly surprising.
On Tue, 20 Dec 2022 at 02:51, G?ran Brostr?m <gb at ehar.se> wrote:
I have a long vector x with five-digit codes where the first digit of each is of special interest, so I extracted them through
> y <- x %/% 10000
but to my surprise y contained the value -1 in some places. It turned out that x contains -1 as a symbol for 'missing value' so in effect I found that
> -1 %/% 10000 == -1
Had to check the help page for "%/%", and the first relevant comment I
found was:
"Users are sometimes surprised by the value returned".
No surprise there. Further down:
?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? indicates
integer division. It is guaranteed that
? x == (x %% y) + y * (x %/% y) ? (up to rounding error)
I did expect (a %/% b) to return round(a / b), like gfortran and gcc,
but instead I get floor(a / b) in R. What is the reason for these
different definitions? And shouldn't R's definition be documented?
Thanks, G?ran
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks Richard, the "rounding claim" was my mistake (as I replied to Martin), I should said "truncates toward zero" as you explain. However, my point was that these two mathematical functions should be defined in the documentation, as you also say. And I was surprised that there is no consensus regarding the definition of such elementary functions. G?ran
On 2022-12-20 03:01, Richard O'Keefe wrote:
The Fortran '08 standard says << One operand of type integer may be divided by another operand of type integer. Although the mathematical quotient of two integers is not necessarily an integer, Table 7.2 specifies that an expression involving the division operator with two operands of type integer is interpreted as an expression of type integer. The result of such an operation is the integer closest to the mathematical quotient and between zero and the mathematical quotient inclusively. >> Another way to say this is that integer division in Fortran TRUNCATES towards zero.? It does not round and never has. C carefully left the behaviour of integer division (/) unspecified, but introduced the div(,) function with the same effect as Fortran (/).? Later versions of the C standard tightened this up, and the C17 standard reads << The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined. When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded. 107) If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a ; otherwise, the behavior of both a/b and a%b is undefined.>> That is, C17 TRUNCATES the result of division towards zero.? I don't know of any C compiler that rounds, certainly gcc does not. The Java 15 Language Specification says << Integer division rounds toward 0. >> which also specified truncating division. The help for ?"%/%" does not say what the result is. Or if it does, I can't find it.? Either way, this is a defect in the documentation.? It needs to be spelled out very clearly. R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
> c(-8,8) %/% 3
[1] -3 ?2
so we deduce that R *neither* rounds *not* truncates,
but returns the floor of the quotient.
It is widely argued that flooring division is more
generally useful than rounding or truncating division,
but it is admittedly surprising.
On Tue, 20 Dec 2022 at 02:51, G?ran Brostr?m <gb at ehar.se
<mailto:gb at ehar.se>> wrote:
I have a long vector x with five-digit codes where the first digit of
each is of special interest, so I extracted them through
?> y <- x %/% 10000
but to my surprise y contained the value -1 in some places. It turned
out that x contains -1 as a symbol for 'missing value' so in effect I
found that
?> -1 %/% 10000 == -1
Had to check the help page for "%/%", and the first relevant comment I
found was:
"Users are sometimes surprised by the value returned".
No surprise there. Further down:
?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? indicates
? ? ? integer division.? It is guaranteed that
? ? ? ? x == (x %% y) + y * (x %/% y) ? (up to rounding error)
I did expect? (a %/% b) to return round(a / b), like gfortran and gcc,
but instead I get floor(a / b) in R. What is the reason for these
different definitions? And shouldn't R's definition be documented?
Thanks, G?ran
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
Documentation specifics aside, and I am not convinced that is an issue here, there is a responsibility on programmers on how to use routines like this by testing small samples and seeing if the results match expectations. Since negative numbers were possible, that would have been part of such tests. And there are many ways to do things and the method chosen does not strike me as a particularly great method of finding out about the first digit unless you are guaranteed to have exactly five digits. It may be efficient but it can likely fail in many cases where the data is not as expected such as more than five digits or not containing a number. So many programmers would first filter the data to check for various conditions. Some might simply try to convert the numbers into character strings (perhaps the absolute value of the number instead) and look at the first character instead, or handle it differently if it is a minus sign. Many programming languages contain families of functions for some tasks when there are many possible ways to do something that can get somewhat different results. There is absolutely NO reason to assume that any one member of a family of functions will do what you expect and you may need to either explore others that are similar or make your own. Something as simple as this might give you what you want: first_of_five <- function(numb) abs(numb) %/% 10000 -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of G?ran Brostr?m Sent: Tuesday, December 20, 2022 1:53 AM To: Richard O'Keefe <raoknz at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Integer division Thanks Richard, the "rounding claim" was my mistake (as I replied to Martin), I should said "truncates toward zero" as you explain. However, my point was that these two mathematical functions should be defined in the documentation, as you also say. And I was surprised that there is no consensus regarding the definition of such elementary functions. G?ran
On 2022-12-20 03:01, Richard O'Keefe wrote:
The Fortran '08 standard says << One operand of type integer may be divided by another operand of type integer. Although the mathematical quotient of two integers is not necessarily an integer, Table 7.2 specifies that an expression involving the division operator with two operands of type integer is interpreted as an expression of type integer. The result of such an operation is the integer closest to the mathematical quotient and between zero and the mathematical quotient inclusively. >> Another way to say this is that integer division in Fortran TRUNCATES towards zero. It does not round and never has. C carefully left the behaviour of integer division (/) unspecified, but introduced the div(,) function with the same effect as Fortran (/). Later versions of the C standard tightened this up, and the C17 standard reads << The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined. When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded. 107) If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a ; otherwise, the behavior of both a/b and a%b is undefined.>> That is, C17 TRUNCATES the result of division towards zero. I don't know of any C compiler that rounds, certainly gcc does not. The Java 15 Language Specification says << Integer division rounds toward 0. >> which also specified truncating division. The help for ?"%/%" does not say what the result is. Or if it does, I can't find it. Either way, this is a defect in the documentation. It needs to be spelled out very clearly. R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
> c(-8,8) %/% 3
[1] -3 2
so we deduce that R *neither* rounds *not* truncates, but returns the
floor of the quotient.
It is widely argued that flooring division is more generally useful
than rounding or truncating division, but it is admittedly surprising.
On Tue, 20 Dec 2022 at 02:51, G?ran Brostr?m <gb at ehar.se
<mailto:gb at ehar.se>> wrote:
I have a long vector x with five-digit codes where the first digit of
each is of special interest, so I extracted them through
> y <- x %/% 10000
but to my surprise y contained the value -1 in some places. It turned
out that x contains -1 as a symbol for 'missing value' so in effect I
found that
> -1 %/% 10000 == -1
Had to check the help page for "%/%", and the first relevant comment I
found was:
"Users are sometimes surprised by the value returned".
No surprise there. Further down:
?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? indicates
integer division. It is guaranteed that
? x == (x %% y) + y * (x %/% y) ? (up to rounding error)
I did expect (a %/% b) to return round(a / b), like gfortran and gcc,
but instead I get floor(a / b) in R. What is the reason for these
different definitions? And shouldn't R's definition be documented?
Thanks, G?ran
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Lack of consensus: I should mention Python's // operator, which does flooring division. I should mention Common Lisp, where (floor - -), (ceiling - -), (round - -), and (truncate - -) all return a quotient and appropriate remainder. I should mention Smalltalk, where // and \\ are flooring quotient and remainder and quo: and rem: are truncating quotient and remainder. I should give dishonourable mention to certain programming languages where the quotient and remainder operators do not actually fit together. Why the lack of consensus: It starts with the fact that there wasn't an agreed *mathematical* definition. Number theorists, as a rule, don't care about negative numbers all that much. To the extent that they do care, x mod y has to go around in neat cycles, which flooring division does satisfy and truncating division does not. It then goes on to early computers which used sign-and- magnitude or ones-complement representation. In those computers, truncating division was the *obvious* thing to do. It also had the nice property that n / (2**k) was the same thing as an arithmetic right shift by k bits. And then twos-complement became popular. And not only is the twos-complement range asymmetric (so that x might be representable but -x not) but arithmetic right shifts aren't the same as truncating division any more. Whoops! And then, although flooring division still made sense for twos-complement but truncating division didn't really, new programming languages kept on specifying truncating division because the programming languages of the 1960s for the hardware of the 1960s did so. So new hardware designers supported the new programming languages without supporting the *reasons* why truncating division had been used.
(-8:7)%%4
[1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Think about for example histogramming a collection of integers by their remainders and what would happen with truncating remainder.
On Tue, 20 Dec 2022 at 19:53, G?ran Brostr?m <gb at ehar.se> wrote:
Thanks Richard, the "rounding claim" was my mistake (as I replied to Martin), I should said "truncates toward zero" as you explain. However, my point was that these two mathematical functions should be defined in the documentation, as you also say. And I was surprised that there is no consensus regarding the definition of such elementary functions. G?ran On 2022-12-20 03:01, Richard O'Keefe wrote:
The Fortran '08 standard says << One operand of type integer may be divided by another operand of type integer. Although the mathematical quotient of two integers is not necessarily an integer, Table 7.2 specifies that an expression involving the division operator with two operands of type integer is interpreted as an expression of type integer. The result of such an operation is the integer closest to the mathematical quotient and between zero and the mathematical quotient inclusively. >> Another way to say this is that integer division in Fortran TRUNCATES towards zero. It does not round and never has. C carefully left the behaviour of integer division (/) unspecified, but introduced the div(,) function with the same effect as Fortran (/). Later versions of the C standard tightened this up, and the C17 standard reads << The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined. When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded. 107) If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a ; otherwise, the behavior of both a/b and a%b is undefined.>> That is, C17 TRUNCATES the result of division towards zero. I don't know of any C compiler that rounds, certainly gcc does not. The Java 15 Language Specification says << Integer division rounds toward 0. >> which also specified truncating division. The help for ?"%/%" does not say what the result is. Or if it does, I can't find it. Either way, this is a defect in the documentation. It needs to be spelled out very clearly. R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
> c(-8,8) %/% 3
[1] -3 2
so we deduce that R *neither* rounds *not* truncates,
but returns the floor of the quotient.
It is widely argued that flooring division is more
generally useful than rounding or truncating division,
but it is admittedly surprising.
On Tue, 20 Dec 2022 at 02:51, G?ran Brostr?m <gb at ehar.se
<mailto:gb at ehar.se>> wrote:
I have a long vector x with five-digit codes where the first digit of
each is of special interest, so I extracted them through
> y <- x %/% 10000
but to my surprise y contained the value -1 in some places. It turned
out that x contains -1 as a symbol for 'missing value' so in effect I
found that
> -1 %/% 10000 == -1
Had to check the help page for "%/%", and the first relevant comment
I
found was:
"Users are sometimes surprised by the value returned".
No surprise there. Further down:
?%%? indicates ?x mod y? (?x modulo y?) and ?%/%? indicates
integer division. It is guaranteed that
? x == (x %% y) + y * (x %/% y) ? (up to rounding error)
I did expect (a %/% b) to return round(a / b), like gfortran and
gcc,
but instead I get floor(a / b) in R. What is the reason for these
different definitions? And shouldn't R's definition be documented?
Thanks, G?ran
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
Richard O'Keefe
on Wed, 21 Dec 2022 16:44:51 +1300 writes:
> Lack of consensus: I should mention Python's // operator,
> which does flooring division. I should mention Common
> Lisp, where (floor - -), (ceiling - -), (round - -), and
> (truncate - -) all return a quotient and appropriate
> remainder. I should mention Smalltalk, where // and \\
> are flooring quotient and remainder and quo: and rem: are
> truncating quotient and remainder. I should give
> dishonourable mention to certain programming languages
> where the quotient and remainder operators do not actually
> fit together.
> Why the lack of consensus: It starts with the fact that
> there wasn't an agreed *mathematical* definition. Number
> theorists, as a rule, don't care about negative numbers
> all that much. To the extent that they do care, x mod y
> has to go around in neat cycles, which flooring division
> does satisfy and truncating division does not.
> It then goes on to early computers which used sign-and-
> magnitude or ones-complement representation. In those
> computers, truncating division was the *obvious* thing to
> do. It also had the nice property that n / (2**k) was the
> same thing as an arithmetic right shift by k bits. And
> then twos-complement became popular. And not only is the
> twos-complement range asymmetric (so that x might be
> representable but -x not) but arithmetic right shifts
> aren't the same as truncating division any more. Whoops!
> And then, although flooring division still made sense
> for twos-complement but truncating division didn't really,
> new programming languages kept on specifying truncating
> division because the programming languages of the 1960s
> for the hardware of the 1960s did so. So new hardware
> designers supported the new programming languages without
> supporting the *reasons* why truncating division had been
> used.
>> (-8:7)%%4
> [1] 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
> Think about for example histogramming a collection of
> integers by their remainders and what would happen with
> truncating remainder.
> On Tue, 20 Dec 2022 at 19:53, G?ran Brostr?m <gb at ehar.se>
> wrote:
>> Thanks Richard,
>>
>> the "rounding claim" was my mistake (as I replied to
>> Martin), I should said "truncates toward zero" as you
>> explain.
>>
>> However, my point was that these two mathematical
>> functions should be defined in the documentation, as you
>> also say. And I was surprised that there is no consensus
>> regarding the definition of such elementary functions.
>>
>> G?ran
[...........]
Thank you all for your contributions, notably Richard's last one
providing really interesting historical context (of "why this mess?").
Note that the Wikipedia page
https://en.m.wikipedia.org/wiki/Modulo_operation
also does mention "Euclidean division" which does have possibly
even nicer mathematical properties than the floored division
R (and quite a few other good softwares) use.
Still, be assured that we won't change R here. Mathematical
(Algebra) related R packages can easily introduce corresponding
versions of div() and mod() functions, I'd say, and I'd guess
these would already exist somewhere.
Yesterday, I've updated the ?Arithmetic help page which now
does mention (more clearly if it was really already derivable
from the previous doc) what happens, also mentioning Knuth and
the Wikipedia page.
--> https://stat.ethz.ch/R-manual/R-devel/library/base/html/Arithmetic.html
(search for "R-devel R-manual ETH" in your browser, then
-> 'Packages' -> 'base' ..)
Martin
--
Martin Maechler
ETH Zurich and R Core team