Match .3 in a sequence

Tue, Mar 17, 2009 8:21 AM

On Tue, Mar 17, 2009 at 10:04:39AM -0400, Stavros Macrakis wrote:

...

Yes, this is a confusing behavior, since repeated levels are never meaningful.

I think, 15 digits is a reasonable choice. Mapping double precision numbers
and character strings with a given decimal precision is never bijective.
With 15 digits, we can achive that every character value has unique double
precision representation, but not vice versa. With 17 digits, we have a unique
character string for each double precision number, but not vice versa.
What is better?

Specification of as.character says() that the numbers are represented with
15 significant digits. So, I think, if as.factor() applies signif(,digits=15)
to a numeric vector before determining the levels using sort(unique.default(x),
this could help to eliminate most of the problems without being in conflict
with the existing specification.

I do not exactly understand what you mean by inconsistent. If you do
  nums <- (.3 + 2e-16 * c(-2,-1,1,2))
  options(digits=15)
  for (x in nums) print(x)
  # [1] 0.300000000000000
  # [1] 0.3
  # [1] 0.3
  # [1] 0.300000000000000
  as.character(nums)
  # [1] "0.300000000000000" "0.3"               "0.3"              
  # [4] "0.300000000000000"
then print and as.character are consistent. Printing the whole vector
behaves differently, since it uses the same format for all numbers.

Definitely, using comparison tolerance is a meaningful approach. Its disadvantage
is that the relation abs(x - y) <= eps is not transitive. So, it may also produce
confusing results in some situations. I think that one has to choose the right
solution depending on the application.

Petr.

Match .3 in a sequence

Thread (18 messages)