I'll save space and not include previous messages. My 2 cents: At the very least the documentation needs a fix. If it is easy to do, then Ted Harding's suggestion of a switch (default OFF) to check for sign difference would be sensible. I would urge inclusion in the documentation of the +0, -0 example(s) if there is NOT a way in R to distinguish these. There are occasions where it is useful to be able to detect things like this (and NaN and Inf and -Inf etc.). They are usually not of interest to users, but sometimes are needed for developers to check edge effects. For those cases it may be time to consider a package FPIEEE754 or some similar name to allow testing and possibly setting of flags for some of the fancier features. Likely used by just a few of us in extreme situations. Unfortunately, some of the nice exception handling that was suggested in the standard is apparently rarely implemented in compilers. For info, I was actually a voting member of IEEE 754 because I found a nice "feature" in the IBM Fortran G arithmetic, though I never sat in on any of the meetings. JN
identical(0, -0)
21 messages · John C Nash, Gabor Grothendieck, Simon Urbanek +5 more
On Sat, Aug 8, 2009 at 10:39 AM, Prof. John C Nash<nashjc at uottawa.ca> wrote:
I would urge inclusion in the documentation of the +0, -0 example(s) if there is NOT a way in R to distinguish these. There are occasions where it
For single numbers try this: x <- +0 y <- -0 identical(x, y) && identical(1/x, 1/y) # FALSE
1 day later
On Sat, Aug 08, 2009 at 10:39:04AM -0400, Prof. John C Nash wrote:
I'll save space and not include previous messages. My 2 cents: At the very least the documentation needs a fix. If it is easy to do, then Ted Harding's suggestion of a switch (default OFF) to check for sign difference would be sensible. I would urge inclusion in the documentation of the +0, -0 example(s) if there is NOT a way in R to distinguish these.
It is possible to distinguish 0 and -0 in R, since 1/0 == Inf and 1/(-0) == -Inf. I do not know, whether there are also other such situations. In particular (0)^(-1) == (-0)^(-1) # [1] TRUE log(0) == log(-0) # [1] TRUE
There are occasions where it is useful to be able to detect things like this (and NaN and Inf and -Inf etc.). They are usually not of interest to users, but sometimes are needed for developers to check edge effects. For those cases it may be time to consider a package FPIEEE754 or some similar name to allow testing and possibly setting of flags for some of the fancier features. Likely used by just a few of us in extreme situations.
I think that distinguishing 0 and -0 may be useful even for nonexpert users for debugging purposes. Mainly, because x == y does not imply that x and y behave equally as demonstrated above or by x <- 0 y <- - 0 x == y # [1] TRUE 1/x == 1/y # [1] FALSE I would like to recall the suggestion
On Sat, Aug 08, 2009 at 03:04:07PM +0200, Martin Maechler wrote:
> Maybe we should introduce a function that's basically
> isTRUE(all.equal(..., tol=0)) {but faster}, or
> do you want a 3rd argument to identical, say 'method'
> with default c("oneNaN", "use.==", "strict")
>
> oneNaN: my proposal of using memcmp() on doubles as its used for
> other types already (and hence distinguishing +0 and -0;
> otherwise keeping the feature that there's just one NaN
> which differs from 'NA' (and there's just one 'NA').
>
> use.==: the previous R behaviour, using '==' on doubles
> (and the "oneNaN" behavior)
>
> strict: be even stricter than oneNaN: Use memcmp()
> unconditionally for doubles. This would be the fastest
> version of all three.
In my opinion, for debugging purposes, the option identical(x,y,method="strict"),
which implies that x and y behave equally, could be useful, if it is available
in R base,
At the R interactive level, negative zero as the value of -0 could possibly
be avoided. However, negative zero may also occur in numerical calculations,
since it may be obtained as x * 0, where x is negative. So, i think, negative
zero cannot be eliminated from consideration as something too infrequent.
Petr.
Petr Savicky wrote:
On Sat, Aug 08, 2009 at 10:39:04AM -0400, Prof. John C Nash wrote:
I'll save space and not include previous messages.
My 2 cents: At the very least the documentation needs a fix. If it is
easy to do, then Ted Harding's suggestion of a switch (default OFF) to
check for sign difference would be sensible.
I would urge inclusion in the documentation of the +0, -0 example(s) if
there is NOT a way in R to distinguish these.
It is possible to distinguish 0 and -0 in R, since 1/0 == Inf and 1/(-0) == -Inf. I do not know, whether there are also other such situations. In particular (0)^(-1) == (-0)^(-1) # [1] TRUE log(0) == log(-0) # [1] TRUE
There are occasions where
it is useful to be able to detect things like this (and NaN and Inf and
-Inf etc.). They are usually not of interest to users, but sometimes are
needed for developers to check edge effects. For those cases it may be
time to consider a package FPIEEE754 or some similar name to allow
testing and possibly setting of flags for some of the fancier features.
Likely used by just a few of us in extreme situations.
I think that distinguishing 0 and -0 may be useful even for nonexpert users for debugging purposes. Mainly, because x == y does not imply that x and y behave equally as demonstrated above or by x <- 0 y <- - 0 x == y # [1] TRUE 1/x == 1/y # [1] FALSE I would like to recall the suggestion On Sat, Aug 08, 2009 at 03:04:07PM +0200, Martin Maechler wrote:
> Maybe we should introduce a function that's basically
> isTRUE(all.equal(..., tol=0)) {but faster}, or
> do you want a 3rd argument to identical, say 'method'
> with default c("oneNaN", "use.==", "strict")
>
> oneNaN: my proposal of using memcmp() on doubles as its used for
> other types already (and hence distinguishing +0 and -0;
> otherwise keeping the feature that there's just one NaN
> which differs from 'NA' (and there's just one 'NA').
>
> use.==: the previous R behaviour, using '==' on doubles
> (and the "oneNaN" behavior)
>
> strict: be even stricter than oneNaN: Use memcmp()
> unconditionally for doubles. This would be the fastest
> version of all three.
In my opinion, for debugging purposes, the option identical(x,y,method="strict"), which implies that x and y behave equally, could be useful, if it is available in R base, At the R interactive level, negative zero as the value of -0 could possibly be avoided. However, negative zero may also occur in numerical calculations, since it may be obtained as x * 0, where x is negative. So, i think, negative zero cannot be eliminated from consideration as something too infrequent.
I wouldn't mind a "strict" option. It would compare bit patterns, so would distinguish +0 from -0, and different NaN values. But having the value of identical(x-y, -(y-x)) depend on whether x and y are equal or not would just lead to confusion.W Duncan Murdoch
On Aug 10, 2009, at 5:47 , Duncan Murdoch wrote:
Petr Savicky wrote:
On Sat, Aug 08, 2009 at 10:39:04AM -0400, Prof. John C Nash wrote:
I'll save space and not include previous messages. My 2 cents: At the very least the documentation needs a fix. If it is easy to do, then Ted Harding's suggestion of a switch (default OFF) to check for sign difference would be sensible. I would urge inclusion in the documentation of the +0, -0 example(s) if there is NOT a way in R to distinguish these.
It is possible to distinguish 0 and -0 in R, since 1/0 == Inf and 1/(-0) == -Inf. I do not know, whether there are also other such situations. In particular (0)^(-1) == (-0)^(-1) # [1] TRUE log(0) == log(-0) # [1] TRUE
There are occasions where it is useful to be able to detect things like this (and NaN and Inf and -Inf etc.). They are usually not of interest to users, but sometimes are needed for developers to check edge effects. For those cases it may be time to consider a package FPIEEE754 or some similar name to allow testing and possibly setting of flags for some of the fancier features. Likely used by just a few of us in extreme situations.
I think that distinguishing 0 and -0 may be useful even for nonexpert users for debugging purposes. Mainly, because x == y does not imply that x and y behave equally as demonstrated above or by x <- 0 y <- - 0 x == y # [1] TRUE 1/x == 1/y # [1] FALSE I would like to recall the suggestion On Sat, Aug 08, 2009 at 03:04:07PM +0200, Martin Maechler wrote:
> Maybe we should introduce a function that's basically
> isTRUE(all.equal(..., tol=0)) {but faster}, or
> do you want a 3rd argument to identical, say 'method'
> with default c("oneNaN", "use.==", "strict")
> > oneNaN: my proposal of using memcmp() on doubles as its
used for
> other types already (and hence distinguishing +0 and -0; > otherwise keeping the feature that there's just one NaN > which differs from 'NA' (and there's just one 'NA').
> > use.==: the previous R behaviour, using '==' on doubles
(and the "oneNaN" behavior)
> > strict: be even stricter than oneNaN: Use memcmp()
> unconditionally for doubles. This would be the fastest > version of all three.
In my opinion, for debugging purposes, the option identical(x,y,method="strict"), which implies that x and y behave equally, could be useful, if it is available in R base, At the R interactive level, negative zero as the value of -0 could possibly be avoided. However, negative zero may also occur in numerical calculations, since it may be obtained as x * 0, where x is negative. So, i think, negative zero cannot be eliminated from consideration as something too infrequent.
I wouldn't mind a "strict" option. It would compare bit patterns, so would distinguish +0 from -0, and different NaN values. But having the value of identical(x-y, -(y-x)) depend on whether x and y are equal or not would just lead to confusion.
... but so do other things routinely such as floating point arithmetics so I don't think this is a strong argument here. IMHO identical(0, -0) should return FALSE, because they are simply not the same objects and that's what identical is supposed test for. If you want to test equality of elements there are other means you should be using that were mentioned in this thread. Cheers, Simon
On 8/10/2009 9:55 AM, Simon Urbanek wrote:
On Aug 10, 2009, at 5:47 , Duncan Murdoch wrote:
Petr Savicky wrote:
On Sat, Aug 08, 2009 at 10:39:04AM -0400, Prof. John C Nash wrote:
I'll save space and not include previous messages. My 2 cents: At the very least the documentation needs a fix. If it is easy to do, then Ted Harding's suggestion of a switch (default OFF) to check for sign difference would be sensible. I would urge inclusion in the documentation of the +0, -0 example(s) if there is NOT a way in R to distinguish these.
It is possible to distinguish 0 and -0 in R, since 1/0 == Inf and 1/(-0) == -Inf. I do not know, whether there are also other such situations. In particular (0)^(-1) == (-0)^(-1) # [1] TRUE log(0) == log(-0) # [1] TRUE
There are occasions where it is useful to be able to detect things like this (and NaN and Inf and -Inf etc.). They are usually not of interest to users, but sometimes are needed for developers to check edge effects. For those cases it may be time to consider a package FPIEEE754 or some similar name to allow testing and possibly setting of flags for some of the fancier features. Likely used by just a few of us in extreme situations.
I think that distinguishing 0 and -0 may be useful even for nonexpert users for debugging purposes. Mainly, because x == y does not imply that x and y behave equally as demonstrated above or by x <- 0 y <- - 0 x == y # [1] TRUE 1/x == 1/y # [1] FALSE I would like to recall the suggestion On Sat, Aug 08, 2009 at 03:04:07PM +0200, Martin Maechler wrote:
> Maybe we should introduce a function that's basically
> isTRUE(all.equal(..., tol=0)) {but faster}, or
> do you want a 3rd argument to identical, say 'method'
> with default c("oneNaN", "use.==", "strict")
> > oneNaN: my proposal of using memcmp() on doubles as its
used for
> other types already (and hence distinguishing +0 and -0; > otherwise keeping the feature that there's just one NaN > which differs from 'NA' (and there's just one 'NA').
> > use.==: the previous R behaviour, using '==' on doubles
(and the "oneNaN" behavior)
> > strict: be even stricter than oneNaN: Use memcmp()
> unconditionally for doubles. This would be the fastest > version of all three.
In my opinion, for debugging purposes, the option identical(x,y,method="strict"), which implies that x and y behave equally, could be useful, if it is available in R base, At the R interactive level, negative zero as the value of -0 could possibly be avoided. However, negative zero may also occur in numerical calculations, since it may be obtained as x * 0, where x is negative. So, i think, negative zero cannot be eliminated from consideration as something too infrequent.
I wouldn't mind a "strict" option. It would compare bit patterns, so would distinguish +0 from -0, and different NaN values. But having the value of identical(x-y, -(y-x)) depend on whether x and y are equal or not would just lead to confusion.
... but so do other things routinely such as floating point arithmetics so I don't think this is a strong argument here. IMHO identical(0, -0) should return FALSE, because they are simply not the same objects and that's what identical is supposed test for. If you want to test equality of elements there are other means you should be using that were mentioned in this thread.
+0 and -0 are exactly equal, which is what identical is documented to be testing. They are not indistinguishable, and not identical in the English meaning of the word, but they are identical in the sense of what the identical() function is documented to test. The cases where you want to distinguish between them are rare. They should not be distinguished in the default identical() test, any more than different values of NaN should be distinguished (and identical() is explicitly documented *not* to distinguish those). Of the 1600 uses of identical() in the R base plus recommended packages, there are lots of cases where equality of elements is clearly the intention. There are almost no uses of the all.equal(..., tol=0) idiom in base R, and among the recommended packages, only Matrix uses it (but uses identical() for values as well, I think.) Distinguishing between different NaN values might be harmless, because we probably only generate one. (I'm not sure about that, the literal NaN might be different from sqrt(-1) or 0/0. But I'd guess only one comes up in normal usage.) But we definitely generate both +0 and -0 all the time, and distinguishing between them would mean identical() would be useless for value-based comparison. Do you want to evaluate all 1600 uses in the base and recommended package, and who knows how many on CRAN, to figure out which ones should be changed to all.equal(..., tol=0)? I don't. Duncan Murdoch
For people who want to play with these, here are some functions that let
you get or set the "payload" value in a NaN. NaN and NA, Inf and -Inf
are stored quite similarly; these functions don't distinguish which of
those you're working with. Regular finite values give NA for the
payload value, and elements of x are unchanged if you try to set their
payload to NA.
By the way, this also shows that R *can* distinguish different NaN
values, but you need some byte-level manipulations.
Duncan Murdoch
showBytes <- function(x) {
bytes <- rawConnection(raw(0), "w")
on.exit(close(bytes))
writeBin(x, bytes)
rawConnectionValue(bytes)
}
NaNpayload <- function(x) {
if (typeof(x) != "double") stop("Can only handle doubles")
bytes <- as.integer(showBytes(x))
base <- 1 + (seq_along(x)-1)*8
S <- bytes[base + 7] %/% 128
E <- (bytes[base + 7] %% 128)*16 + bytes[base + 6] %/% 16
F <- bytes[base + 6] %% 16
for (i in 5:0) {
F <- F*256 + bytes[base + i]
}
nan <- E == 2047 # Add " & F != 0 " if you don't want to include
infinities
ifelse(nan, (1-2*S)*F/2^52, NA)
}
"NaNpayload<-" <- function(x, value) {
x <- as.double(x)
payload <- value
new <- payload[!is.na(payload)]
if (any( new <= -1 | new >= 1 )) stop("The payload values must be
between -1 and 1")
payload <- rep(payload, len=max(length(x), length(payload)))
x <- rep(x, len=length(payload))
bytes <- as.integer(showBytes(x))
base <- 1 + (seq_along(x)-1)*8
base[is.na(payload)] <- NA
F <- trunc(abs(payload)*2^52)
for (i in 0:5) {
bytes[base + i] <- F %% 256
F <- F %/% 256
}
bytes[base + 6] <- F + 0xF0
bytes[base + 7] <- (payload < 0)*128 + 0x7F
con <- rawConnection(as.raw(bytes), "r")
on.exit(close(con))
readBin(con, "double", length(x))
}
Example:
> x <- c(NA, NaN, 0, 1, Inf)
> NaNpayload(x)
[1] 0.5 -0.5 NA NA 0.0
> NaNpayload(x) <- -0.4
> x
[1] NaN NaN NaN NaN NaN
> y <- x
> NaNpayload(y) <- 0.6
> y
[1] NaN NaN NaN NaN NaN
> NaNpayload(x)
[1] -0.4 -0.4 -0.4 -0.4 -0.4
> NaNpayload(y)
[1] 0.6 0.6 0.6 0.6 0.6
> identical(x, y)
[1] TRUE
On Mon, Aug 10, 2009 at 05:47:57AM -0400, Duncan Murdoch wrote:
I wouldn't mind a "strict" option. It would compare bit patterns, so would distinguish +0 from -0, and different NaN values.
I think that a logical option "strict" in the above meaning could be useful for debugging. The default may be FALSE.
On Mon, Aug 10, 2009 at 10:20:39AM -0400, Duncan Murdoch wrote:
+0 and -0 are exactly equal, which is what identical is documented to be testing. They are not indistinguishable, and not identical in the English meaning of the word, but they are identical in the sense of what the identical() function is documented to test. The cases where you want to distinguish between them are rare. They should not be distinguished in the default identical() test, any more than different values of NaN should be distinguished (and identical() is explicitly documented *not* to distinguish those).
[...] The question, whether 0 and -0 are equal or not, is not clear, since they have different reciprocals. However, i agree that distinguishing the signs of zero is rarely useful. From this point of view, the default FALSE seems to be acceptable. For completeness, let me also add an argument that it would not be too harmful, if the default is TRUE. I think that it is quite rare to have two larger numerical structures, which match up to the last bits in all numbers, but have a different sign of some zero. Matching all bits almost requires that the two structures are obtained using the same expressions for all components. Then, also the signs of zeros will match. However, i may be wrong. Petr.
"DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
on Mon, 10 Aug 2009 11:51:53 -0400 writes:
DM> For people who want to play with these, here are some functions that let
DM> you get or set the "payload" value in a NaN. NaN and NA, Inf and -Inf
DM> are stored quite similarly; these functions don't distinguish which of
DM> those you're working with. Regular finite values give NA for the
DM> payload value, and elements of x are unchanged if you try to set their
DM> payload to NA.
DM> By the way, this also shows that R *can* distinguish different NaN
DM> values, but you need some byte-level manipulations.
yes; very nice code, indeed!
I propose a version of the showBytes() utility should be added
either as an example e.g. in writeBin() or even an exported
function in package 'utils'
[.........]
> Example:
>> x <- c(NA, NaN, 0, 1, Inf)
>> NaNpayload(x)
> [1] 0.5 -0.5 NA NA 0.0
Interestingly, on 64-bit, I get a slightly different answer above,
(when all the following code gives exactly the same results,
and of course, that was your main point !), namely
4.338752e-13 instead of 0.5 for 'NA',
see below.
.. and your nice tools also let me detect an even simpler way
to get *two* versions of NA, and NaN, each :
Conclusion: Both NaN and NA (well NA_real_) have a sign, too !
NaNpayload(NA_real_)
##[1] 4.338752e-13
NaNpayload(-NA_real_)
##[1] -4.338752e-13 ## !! different
str(NApm <- c(1[2], -1[2]))
t(sapply(NApm, showBytes))
## [1,] a2 07 00 00 00 00 f0* 7f
## [2,] a2 07 00 00 00 00 f0* ff
## or summarizing things :
## Or, "in summary" -- Duncan's original example slightly extended:
x <- c(NaN, -NaN, NA, -NA_real_, 0, 0.1, Inf, -Inf)
x
names(x) <- format(x)
sapply(x, showBytes)
## NaN NaN NA NA 0.0 0.1 Inf -Inf
## [1,] 00 00 a2 a2 00 9a 00 00
## [2,] 00 00 07 07 00 99 00 00
## [3,] 00 00 00 00 00 99 00 00
## [4,] 00 00 00 00 00 99 00 00
## [5,] 00 00 00 00 00 99 00 00
## [6,] 00 00 00 00 00 99 00 00
## [7,] f8 f8 f8* f8* 00 b9 f0 f0
## [8,] ff 7f 7f ff 00 3f 7f ff
## (*) NOTE: the 'f0*' or 'f8*' above are
## --- 'f8' on 32-bit, 'f0' on 64-bit
>> NaNpayload(x) <- -0.4
>> x
> [1] NaN NaN NaN NaN NaN
>> y <- x
>> NaNpayload(y) <- 0.6
>> y
> [1] NaN NaN NaN NaN NaN
>> NaNpayload(x)
> [1] -0.4 -0.4 -0.4 -0.4 -0.4
>> NaNpayload(y)
> [1] 0.6 0.6 0.6 0.6 0.6
>> identical(x, y)
> [1] TRUE
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
On Tue, Aug 11, 2009 at 10:04:20AM +0200, Martin Maechler wrote:
"DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
on Mon, 10 Aug 2009 11:51:53 -0400 writes:
DM> For people who want to play with these, here are some functions that let
DM> you get or set the "payload" value in a NaN. NaN and NA, Inf and -Inf
DM> are stored quite similarly; these functions don't distinguish which of
DM> those you're working with. Regular finite values give NA for the
DM> payload value, and elements of x are unchanged if you try to set their
DM> payload to NA.
DM> By the way, this also shows that R *can* distinguish different NaN
DM> values, but you need some byte-level manipulations.
yes; very nice code, indeed!
I propose a version of the showBytes() utility should be added
either as an example e.g. in writeBin() or even an exported
function in package 'utils'
[.........]
> Example:
>> x <- c(NA, NaN, 0, 1, Inf)
>> NaNpayload(x)
> [1] 0.5 -0.5 NA NA 0.0
Interestingly, on 64-bit, I get a slightly different answer above, (when all the following code gives exactly the same results, and of course, that was your main point !), namely 4.338752e-13 instead of 0.5 for 'NA', see below. .. and your nice tools also let me detect an even simpler way to get *two* versions of NA, and NaN, each : Conclusion: Both NaN and NA (well NA_real_) have a sign, too ! NaNpayload(NA_real_) ##[1] 4.338752e-13 NaNpayload(-NA_real_) ##[1] -4.338752e-13 ## !! different str(NApm <- c(1[2], -1[2])) t(sapply(NApm, showBytes)) ## [1,] a2 07 00 00 00 00 f0* 7f ## [2,] a2 07 00 00 00 00 f0* ff ## or summarizing things : ## Or, "in summary" -- Duncan's original example slightly extended: x <- c(NaN, -NaN, NA, -NA_real_, 0, 0.1, Inf, -Inf) x names(x) <- format(x) sapply(x, showBytes) ## NaN NaN NA NA 0.0 0.1 Inf -Inf ## [1,] 00 00 a2 a2 00 9a 00 00 ## [2,] 00 00 07 07 00 99 00 00 ## [3,] 00 00 00 00 00 99 00 00 ## [4,] 00 00 00 00 00 99 00 00 ## [5,] 00 00 00 00 00 99 00 00 ## [6,] 00 00 00 00 00 99 00 00 ## [7,] f8 f8 f8* f8* 00 b9 f0 f0 ## [8,] ff 7f 7f ff 00 3f 7f ff ## (*) NOTE: the 'f0*' or 'f8*' above are ## --- 'f8' on 32-bit, 'f0' on 64-bit
>> NaNpayload(x) <- -0.4
>> x
> [1] NaN NaN NaN NaN NaN
>> y <- x
>> NaNpayload(y) <- 0.6
>> y
> [1] NaN NaN NaN NaN NaN
>> NaNpayload(x)
> [1] -0.4 -0.4 -0.4 -0.4 -0.4
>> NaNpayload(y)
> [1] 0.6 0.6 0.6 0.6 0.6
>> identical(x, y)
> [1] TRUE
The above examples convince me that the default behavior of identical() should not be based on bit patterns, since the differences between different NaN's or even different NA's are irrelevant except if we use the bit manipulations explicitly. Let me suggest the following short description in ?identical The safe and reliable way to test two objects for being equal in structure, types of components and their values. It returns 'TRUE' in this case, 'FALSE' in every other case. and replacing the paragraph 'identical' sees 'NaN' as different from 'NA_real_', but all 'NaN's are equal (and all 'NA' of the same type are equal). in ?identical by Comparison of objects of numeric type uses '==' for comparison of their components. This means that the values of the components rather than their machine representation is compared. In particular, '0' and '-0' are considered equal, all 'NA's of the same type are equal and all 'NaN's are equal, although their bit patterns may differ in some cases. 'NA' and 'NaN' are always different. Note also that 1/0 and 1/(-0) are different. Petr.
1 day later
Let me add the following to the discussion of identical(0, -0). I would like to suggest to replace the paragraph 'identical' sees 'NaN' as different from 'NA_real_', but all 'NaN's are equal (and all 'NA' of the same type are equal). in ?identical by the following text, which is a correction of my previous suggestion for the same paragraph Components of numerical objects are compared as follows. For non-missing values, "==" is used. In particular, '0' and '-0' are considered equal. All 'NA's of the same type are equal and all 'NaN's are equal, although their bit patterns may differ in some cases. 'NA' and 'NaN' are always different. Note also that 1/0 and 1/(-0) are different. The suggestion for the default of identical(0, -0) is TRUE, because the negative zero is much less important than NA na NaN and, possibly, distinguishing 0 and -0 could even be deprecated. Moreover, the argument of efficiency of memcmp cannot be used here, since there are different variants of NaN and NA, which should not be distinguished by default. Petr.
"PS" == Petr Savicky <savicky at cs.cas.cz>
on Wed, 12 Aug 2009 13:50:46 +0200 writes:
PS> Let me add the following to the discussion of identical(0, -0).
PS> I would like to suggest to replace the paragraph
PS> 'identical' sees 'NaN' as different from 'NA_real_', but all
PS> 'NaN's are equal (and all 'NA' of the same type are equal).
PS> in ?identical by the following text, which is a correction of my previous
PS> suggestion for the same paragraph
> Components of numerical objects are compared as follows. For non-missing
> values, "==" is used. In particular, '0' and '-0' are considered equal.
> All 'NA's of the same type are equal and all 'NaN's are equal, although
> their bit patterns may differ in some cases. 'NA' and 'NaN' are always
> different.
> Note also that 1/0 and 1/(-0) are different.
the 'numerical' would have to be qualified ('double', 'complex'
via double), as indeed, memcmp() is used on integers
The last sentence is not necessary and probably even confusing:
Of course, -Inf and Inf are different.
PS> The suggestion for the default of identical(0, -0) is TRUE, because the
PS> negative zero is much less important than NA na NaN and, possibly,
PS> distinguishing 0 and -0 could even be deprecated.
What should that mean?? R *is* using the international floating
point standards, and 0 and -0 exist there and they *are*
different!
If R would start --- with a performance penalty, btw ! ---
to explicitly map all internal '-0' into '+0' we would
explicitly move away from the international FP standards...
no way!
PS> Moreover, the argument
PS> of efficiency of memcmp cannot be used here, since there are different
PS> variants of NaN and NA, which should not be distinguished by default.
your argument is only partly true... as memcmp() can still be
used instead of '==' *after* the NA-treatments {my current
patch does so},
and even more as I have been proposing an option "strict" which
would only use memcmp() {and hence also distinguish different
NA, NaN's}.
Martin
On Wed, Aug 12, 2009 at 04:02:28PM +0200, Martin Maechler wrote:
"PS" == Petr Savicky <savicky at cs.cas.cz>
on Wed, 12 Aug 2009 13:50:46 +0200 writes:
PS> Let me add the following to the discussion of identical(0, -0).
PS> I would like to suggest to replace the paragraph
PS> 'identical' sees 'NaN' as different from 'NA_real_', but all
PS> 'NaN's are equal (and all 'NA' of the same type are equal).
PS> in ?identical by the following text, which is a correction of my previous
PS> suggestion for the same paragraph
> Components of numerical objects are compared as follows. For non-missing
> values, "==" is used. In particular, '0' and '-0' are considered equal.
> All 'NA's of the same type are equal and all 'NaN's are equal, although
> their bit patterns may differ in some cases. 'NA' and 'NaN' are always
> different.
> Note also that 1/0 and 1/(-0) are different.
the 'numerical' would have to be qualified ('double', 'complex'
via double), as indeed, memcmp() is used on integers
The last sentence is not necessary and probably even confusing:
Of course, -Inf and Inf are different.
I agree.
PS> The suggestion for the default of identical(0, -0) is TRUE, because the
PS> negative zero is much less important than NA na NaN and, possibly,
PS> distinguishing 0 and -0 could even be deprecated.
What should that mean?? R *is* using the international floating
point standards, and 0 and -0 exist there and they *are*
different!
I am sorry for being too short. In my opinion, distinguishing 0 and -0 is not useful enough to make the default behavior of identical() different from the behavior of == in this case.
If R would start --- with a performance penalty, btw ! --- to explicitly map all internal '-0' into '+0' we would explicitly move away from the international FP standards... no way!
Yes, i agree. I did not meant this.
PS> Moreover, the argument
PS> of efficiency of memcmp cannot be used here, since there are different
PS> variants of NaN and NA, which should not be distinguished by default.
your argument is only partly true... as memcmp() can still be
used instead of '==' *after* the NA-treatments {my current
patch does so},
OK. In this case, memcmp() could still be faster than ==, although this is beyond my knowledge.
and even more as I have been proposing an option "strict" which
would only use memcmp() {and hence also distinguish different
NA, NaN's}.
I understand the previous messages in this thread as that there is an agreement that such an option would be very useful and would lead to faster comparison. Petr.
9 days later
I have taken up the issue now, and after thinking, studying the source, trying to define a 'method = <string>' argument, came to the conclusion that both the implementation and documentation (and source code "self-explanation") are easiest to program, maintain, and understand, if I introduce explicit binary switches, so I now propose the following R-level interface which keeps the current behavior the default:
Usage:
identical(x, y, num.EQ = TRUE, one.NA = TRUE, attrib.asSet = TRUE)
Arguments:
x, y: any R objects.
num.EQ: logical indicating if ('double' and 'complex' non-'NA')
numbers should be compared using '==', or by bitwise
comparison. The latter (non-default) differentiates between
'-0' and '+0'.
one.NA: logical indicating if there is conceptually just one numeric
'NA' and one 'NaN'; 'one.NA = FALSE' differentiates bit
patterns.
attrib.asSet: logical indicating if 'attributes' of 'x' and 'y' should
be treated as _unordered_ tagged pairlists ("sets"); this
currently also applies to 'slot's of S4 objects. It may well
be too strict to set 'attrib.asSet = FALSE'.
I'm open for better names of arguments, but will not accept "_"
in the argument names {just my taste; no reason for argueing...}.
I've practically finished both C- and R- and Rd-code, but can
still adapt to proposals if there are good reasons for it.
Martin Maechler, ETH Zurich
On Sat, Aug 22, 2009 at 12:00:44AM +0200, Martin Maechler wrote:
I have taken up the issue now, and after thinking, studying the source, trying to define a 'method = <string>' argument, came to the conclusion that both the implementation and documentation (and source code "self-explanation") are easiest to program, maintain, and understand, if I introduce explicit binary switches, so I now propose the following R-level interface which keeps the current behavior the default:
Usage:
identical(x, y, num.EQ = TRUE, one.NA = TRUE, attrib.asSet = TRUE)
Arguments:
x, y: any R objects.
num.EQ: logical indicating if ('double' and 'complex' non-'NA')
numbers should be compared using '==', or by bitwise
comparison. The latter (non-default) differentiates between
'-0' and '+0'.
one.NA: logical indicating if there is conceptually just one numeric
'NA' and one 'NaN'; 'one.NA = FALSE' differentiates bit
patterns.
attrib.asSet: logical indicating if 'attributes' of 'x' and 'y' should
be treated as _unordered_ tagged pairlists ("sets"); this
currently also applies to 'slot's of S4 objects. It may well
be too strict to set 'attrib.asSet = FALSE'.
I appreciate having several binary switches. Besides the arguments above, this is also useful for an interactive use of identical(), for example, for debugging purposes. If there is a difference between objects, then the switches allow to get more information concerning what is the type of the difference.
I'm open for better names of arguments, but will not accept "_"
in the argument names {just my taste; no reason for argueing...}.
I would slightly prefere one.NaN instead of one.NA. In IEEE 754 terminology, R's 'NA's are a subset of 'NaN's. So. NaN is a bit more general notion, although in R, the sets of 'NA's an 'NaN's are disjoint. Moreover, the name one.NaN specifies more clearly, that the issue is important only for numeric types and not, for example, for integer. Petr.
On Sat, Aug 22, 2009 at 1:22 AM, Petr Savicky<savicky at cs.cas.cz> wrote:
On Sat, Aug 22, 2009 at 12:00:44AM +0200, Martin Maechler wrote:
I have taken up the issue now, and after thinking, studying the source, trying to define a 'method = <string>' argument, came to the conclusion that both the implementation and documentation (and source code "self-explanation") are easiest to program, maintain, and understand, if I introduce explicit binary switches, so I now ?propose the following R-level interface which keeps the current behavior the default:
Usage:
? ? ?identical(x, y, num.EQ = TRUE, one.NA = TRUE, attrib.asSet = TRUE)
Arguments:
? ? x, y: any R objects.
? num.EQ: logical indicating if ('double' and 'complex' non-'NA')
? ? ? ? ? numbers should be compared using '==', or by bitwise
? ? ? ? ? comparison. ?The latter (non-default) differentiates between
? ? ? ? ? '-0' and '+0'.
? one.NA: logical indicating if there is conceptually just one numeric
? ? ? ? ? 'NA' and one 'NaN'; ?'one.NA = FALSE' differentiates bit
? ? ? ? ? patterns.
attrib.asSet: logical indicating if 'attributes' of 'x' and 'y' should
? ? ? ? ? be treated as _unordered_ tagged pairlists ("sets"); this
? ? ? ? ? currently also applies to 'slot's of S4 objects. ?It may well
? ? ? ? ? be too strict to set 'attrib.asSet = FALSE'.
My only comment is to make the argument notation a bit more consistent: (num.Eq, one.NA, attrib.as.set) or (numEq, oneNA, attribAsSet) Also, maybe "single" instead of "one". Thanks Henrik
I appreciate having several binary switches. Besides the arguments above, this is also useful for an interactive use of identical(), for example, for debugging purposes. If there is a difference between objects, then the switches allow to get more information concerning what is the type of the difference.
I'm open for better names of arguments, but will not accept "_"
in the argument names {just my taste; no reason for argueing...}.
I would slightly prefere one.NaN instead of one.NA. In IEEE 754 terminology, R's 'NA's are a subset of 'NaN's. So. NaN is a bit more general notion, although in R, the sets of 'NA's an 'NaN's are disjoint. Moreover, the name one.NaN specifies more clearly, that the issue is important only for numeric types and not, for example, for integer. Petr.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
2 days later
"HenrikB" == Henrik Bengtsson <hb at stat.berkeley.edu>
on Sat, 22 Aug 2009 08:34:51 -0700 writes:
HenrikB> On Sat, Aug 22, 2009 at 1:22 AM, Petr Savicky<savicky at cs.cas.cz> wrote:
>> On Sat, Aug 22, 2009 at 12:00:44AM +0200, Martin Maechler wrote:
>>> I have taken up the issue now,
>>> and after thinking, studying the source, trying to define a
>>> 'method = <string>' argument, came to the conclusion that both
>>> the implementation and documentation (and source code "self-explanation")
>>> are easiest to program, maintain, and understand,
>>> if I introduce explicit binary switches,
>>> so I now ?propose the following R-level interface which keeps
>>> the current behavior the default:
>>>
>>> >> Usage:
>>> >>
>>> >> ? ? ?identical(x, y, num.EQ = TRUE, one.NA = TRUE, attrib.asSet = TRUE)
>>> >>
>>> >> Arguments:
>>> >>
>>> >> ? ? x, y: any R objects.
>>> >>
>>> >> ? num.EQ: logical indicating if ('double' and 'complex' non-'NA')
>>> >> ? ? ? ? ? numbers should be compared using '==', or by bitwise
>>> >> ? ? ? ? ? comparison. ?The latter (non-default) differentiates between
>>> >> ? ? ? ? ? '-0' and '+0'.
>>> >>
>>> >> ? one.NA: logical indicating if there is conceptually just one numeric
>>> >> ? ? ? ? ? 'NA' and one 'NaN'; ?'one.NA = FALSE' differentiates bit
>>> >> ? ? ? ? ? patterns.
>>> >>
>>> >> attrib.asSet: logical indicating if 'attributes' of 'x' and 'y' should
>>> >> ? ? ? ? ? be treated as _unordered_ tagged pairlists ("sets"); this
>>> >> ? ? ? ? ? currently also applies to 'slot's of S4 objects. ?It may well
>>> >> ? ? ? ? ? be too strict to set 'attrib.asSet = FALSE'.
HenrikB> My only comment is to make the argument notation a bit more consistent:
HenrikB> (num.Eq, one.NA, attrib.as.set)
HenrikB> or
HenrikB> (numEq, oneNA, attribAsSet)
thank you. I think I'd prefer the (older style) with "."
{and I had only one "." in all options},
and yes, I agree that these are a bit more consistent.
HenrikB> Also, maybe "single" instead of "one".
yeaah.. that's possibly better...
Other votes (on this part)?
>> I appreciate having several binary switches. Besides the arguments above,
>> this is also useful for an interactive use of identical(), for example,
>> for debugging purposes. If there is a difference between objects, then
>> the switches allow to get more information concerning what is the type
>> of the difference.
exactly, thanks..
>>> I'm open for better names of arguments, but will not accept "_"
>>> in the argument names {just my taste; no reason for argueing...}.
>>
>> I would slightly prefere one.NaN instead of one.NA.
>> In IEEE 754 terminology, R's 'NA's are a subset of
>> 'NaN's. So. NaN is a bit more general notion, although in
>> R, the sets of 'NA's an 'NaN's are disjoint. Moreover,
>> the name one.NaN specifies more clearly, that the issue
>> is important only for numeric types and not, for example,
>> for integer.
>>
>> Petr.
You are right of course about IEEE NaN's,
and also the fact that there *are* non-numeric NAs in R.
However, in the R world, 'NA' is much more known to users,
and much more importantly,
> is.na(NaN)
[1] TRUE
> is.nan(NA)
[1] FALSE
so in R, (the) NaN is rather a special case of NA.
Additionally, 'NA' is considerably faster to type than 'NaN'..
Consequently, I'd rather keep that.
Thanks again, Petr and Henrik, for your feedback!
Martin
I have noticed that many cross references in the help files using
the \code{\link{foo}} command in the .Rd file fail to work in
2.10.0. Unfortunately, it fails only some of the time.
As an example, consider the lines in the factanal help file:
Using 2.10.0
See Also
, , , ability.cov, Harman23.cor, Harman74.cor
when done in 2.9.1, this reads
See Also
print.loadings, varimax, princomp, ability.cov, Harman23.cor, Harman74.cor
I do not know if this is just a Mac issue, but it is troublesome.
For 2.10.0 (development) I am using:
R version 2.10.0 Under development (unstable) (2009-08-25 r49420)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] psych_1.0-79
For 2.9.1 I am using
R version 2.9.1 (2009-06-26)
i386-apple-darwin8.11.1
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] psych_1.0-79
Thanks to all the developers of such a great program.
Bill
William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org/personality.html Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Use R for psychology http://personality-project.org/r It is 5 minutes to midnight http://www.thebulletin.org
On 8/25/2009 11:59 AM, William Revelle wrote:
I have noticed that many cross references in the help files using
the \code{\link{foo}} command in the .Rd file fail to work in
2.10.0. Unfortunately, it fails only some of the time.
As an example, consider the lines in the factanal help file:
Using 2.10.0 See Also , , , ability.cov, Harman23.cor, Harman74.cor
Note that you may need to reinstall packages from source in R-devel, as the internal format of help pages has changed recently. If that doesn't help, then I'd like to track it down, but factanal doesn't appear to be on CRAN or Bioconductor, so I can't even start to reproduce the problem. Duncan Murdoch
when done in 2.9.1, this reads See Also print.loadings, varimax, princomp, ability.cov, Harman23.cor, Harman74.cor I do not know if this is just a Mac issue, but it is troublesome. For 2.10.0 (development) I am using: R version 2.10.0 Under development (unstable) (2009-08-25 r49420) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_1.0-79 For 2.9.1 I am using R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_1.0-79 Thanks to all the developers of such a great program. Bill
On 8/25/2009 11:59 AM, William Revelle wrote:
I have noticed that many cross references in the help files using
the \code{\link{foo}} command in the .Rd file fail to work in
2.10.0. Unfortunately, it fails only some of the time.
As an example, consider the lines in the factanal help file:
Using 2.10.0
Sorry, please ignore the "can't find factanal" part of my previous message, you're talking about the factanal man page in stats (as Michael Dewey pointed out to me). But the advice still stands: you may need to rebuild from source to get all the help working. If you're working with a binary build, try again with a newer one when it is available. r49420 isn't very old, but it might be old enough so that something like this has been fixed. Duncan Murdoch
See Also , , , ability.cov, Harman23.cor, Harman74.cor when done in 2.9.1, this reads See Also print.loadings, varimax, princomp, ability.cov, Harman23.cor, Harman74.cor I do not know if this is just a Mac issue, but it is troublesome. For 2.10.0 (development) I am using: R version 2.10.0 Under development (unstable) (2009-08-25 r49420) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_1.0-79 For 2.9.1 I am using R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_1.0-79 Thanks to all the developers of such a great program. Bill
Duncan,
Actually, I had this problem building a package of mine (psych)
from source. It also happens with one of the core installations
(methods in the util package) which I downloaded from the development
site for the Mac.
Using 2.9.1
See Also
S3Methods, class, getS3method.
For S4, showMethods, Methods.
but using
2.10.0
See Also
S3Methods, class, .
For S4, showMethods, Methods.
(note that the getS3method is missing following the comma).
I first noticed this problem when I installed 2.10.0 using the 08-19
build. I am sorry, but I do not have that version number.)
I can use the 05-22 build and it is not a problem.
I also note that when compiling from source, in the 5-22 build, the
help files are built for text, html, latex, and examples
00.psych-package text html latex example
Harman text html latex example
ICC text html latex example
...
but, when using the most recent build I get
*** installing help indices
converting help for package 'psych'
finding HTML links ... done
00.psych-package html
Harman html
My sessionInfo() is
R version 2.10.0 Under development (unstable) (2009-08-25 r49420)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] psych_1.0-79
Bill
At 12:44 PM -0400 8/25/09, Duncan Murdoch wrote:
On 8/25/2009 11:59 AM, William Revelle wrote:
I have noticed that many cross references in the help files using
the \code{\link{foo}} command in the .Rd file fail to work in
2.10.0. Unfortunately, it fails only some of the time.
As an example, consider the lines in the factanal help file:
Using 2.10.0
Sorry, please ignore the "can't find factanal" part of my previous message, you're talking about the factanal man page in stats (as Michael Dewey pointed out to me). But the advice still stands: you may need to rebuild from source to get all the help working. If you're working with a binary build, try again with a newer one when it is available. r49420 isn't very old, but it might be old enough so that something like this has been fixed. Duncan Murdoch
See Also , , , ability.cov, Harman23.cor, Harman74.cor when done in 2.9.1, this reads See Also print.loadings, varimax, princomp, ability.cov, Harman23.cor, Harman74.cor I do not know if this is just a Mac issue, but it is troublesome. For 2.10.0 (development) I am using: R version 2.10.0 Under development (unstable) (2009-08-25 r49420) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_1.0-79 For 2.9.1 I am using R version 2.9.1 (2009-06-26) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_1.0-79 Thanks to all the developers of such a great program. Bill
William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org/personality.html Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Use R for psychology http://personality-project.org/r It is 5 minutes to midnight http://www.thebulletin.org