type.convert and doubles

Tue, Apr 22, 2014 12:42 AM

> Agreed. Perhaps even a global option would make sense. We
    > already have an option with a similar spirit:
    > 'options(?stringsAsFactors"=T/F)'. Perhaps
    > 'options(?exactNumericAsString?=T/F)' [or something else]
    > would be desirable, with the option being the default
    > value to the type.convert argument.

No, please, no, not a global option here!

Global options that influence default behavior of basic
functions is too much against the principle of functional
programming, and my personal opinion has always been that
'stringsAsFactors' has been a mistake (as a global option, not
as an argument).

Note that with such global options, the output of sessionInfo()
would in principle have to contain all (such) global options in
addtion to R and package versions in order to diagnose behavior
of R functions.

I think we have more or less agreed that we'd like to have
a new function *argument* to type.convert(); 
passed "upstream" to read.table() and via ... the other
read.<foo>() that call read.table.


    > I also like Gabor?s idea of a ?distinguishing class?. R
    > doesn?t natively support arbitrary precision numbers
    > (AFAIK), but I think that?s what Murray wants. I could
    > imagine some kind of new class emerging here that
    > initially looks just like a character/factor, but may
    > evolve over time to accept arithmetic methods and act more
    > like a number (e.g. knowing that ?0.1?, ?.10? and "1e-1"
    > are the same number, or that ?-9?<?-0.2"). A class
    > ?bignum? perhaps?

That's another interesting idea. As maintainer of CRAN package
'Rmpfr' and co-maintainer of 'gmp', I'm even biased about this
issue.

Martin

    > Cheers, Robert


    > On 4/20/14, 3:24 AM, "Murray Stokely" <murray at stokely.org>

> wrote:

>> Yes, I'm also strongly in favor of having an option for
    >> this.  If there was an option in base R for controlling
    >> this we would just use that and get rid of the separate
    >> RProtoBuf.int64AsString option we use in the RProtoBuf
    >> package on CRAN to control whether 64-bit int types from
    >> C++ are returned to R as numerics or character vectors.
    >> 
    >> I agree that reasonable people can disagree about the
    >> default, but I found my original bug report about this,
    >> so I will counter Robert's example with my favorite
    >> example of what was wrong with the previous behavior :
    >> 
    >> tmp<-data.frame(n=c("72057594037927936",
    >> "72057594037927937"), name=c("foo", "bar"))
    >> length(unique(tmp$n)) # 2 write.csv(tmp, "/tmp/foo.csv",
    >> quote=FALSE, row.names=FALSE) data <-
    >> read.csv("/tmp/foo.csv") length(unique(data$n)) # 1
    >> 
    >> - Murray
    >> 
    >> 
    >> On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek

>> <simon.urbanek at r-project.org> wrote:

>>> On Apr 19, 2014, at 9:00 AM, Martin Maechler

>>> <maechler at stat.math.ethz.ch> wrote:

>>> 
    >>>>>>>>> McGehee, Robert <Robert.McGehee at geodecapital.com>
    >>>>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
    >>>>

>>>> behaves as it

>>>> 
    >>>>> That's only a true statement because the documentation
    >>>>> was changed to reflect the new behavior! The new
    >>>>> feature in type.convert certainly does not behave
    >>>>> according to the documentation as of R 3.0.3. Here's a
    >>>>> snippit:
    >>>> 
    >>>>> The first type that can accept all the non-missing
    >>>>> values is chosen (numeric and complex return values
    >>>>> will represented approximately, of course).
    >>>> 
    >>>>> The key phrase is in parentheses, which reminds the
    >>>>> user to expect a possible loss of precision. That
    >>>>> important parenthetical was removed from the
    >>>>> documentation in R 3.1.0 (among other changes).
    >>>> 
    >>>>> Putting aside the fact that this introduces a large
    >>>>> amount of unnecessary work rewriting SQL / data import
    >>>>> code, SQL packages, my biggest conceptual problem is
    >>>>> that I can no longer rely on a particular function
    >>>>> call returning a particular class. In my example
    >>>>> querying stock prices, about 5% of prices came back as
    >>>>> factors and the remaining 95% as numeric, so we had
    >>>>> random errors popping in throughout the morning.
    >>>> 
    >>>>> Here's a short example showing us how the new behavior
    >>>>> can be unreliable. I pass a character representation
    >>>>> of a uniformly distributed random variable to
    >>>>> type.convert. 90% of the time it is converted to
    >>>>> "numeric" and 10% it is a "factor" (in R 3.1.0). In
    >>>>> the 10% of cases in which type.convert converts to a
    >>>>> factor the leading non-zero digit is always a 9. So if
    >>>>> you were expecting a numeric value, then 1 in 10 times
    >>>>> you may have a bug in your code that didn't exist
    >>>>> before.
    >>>>

>>>>>> class(type.convert(format(runif(1))))

>>>>> cl factor numeric 990 9010
    >>>> 
    >>>> Yes.
    >>>> 
    >>>> Murray's point is valid, too.
    >>>> 
    >>>> But in my view, with the reasoning we have seen here,
    >>>> *and* with the well known software design principle of
    >>>> "least surprise" in mind, I also do think that the
    >>>> default for type.convert() should be what it has been
    >>>> for > 10 years now.
    >>>> 
    >>> 
    >>> I think there should be two separate discussions:
    >>> 
    >>> a) have an option (argument to type.convert and possibly
    >>> read.table) to enable/disable this behavior. I'm
    >>> strongly in favor of this.
    >>> 
    >>> b) decide what the default for a) will be. I have no
    >>> strong opinion, I can see arguments in both directions
    >>> 
    >>> But most importantly I think a) is better than the
    >>> status quo - even if the discussion about b) drags out.
    >>> 
    >>> Cheers, Simon
    >>> 
    >>> 
    >>>

type.convert and doubles

Thread (30 messages)