RFC: hexadecimal constants and decimal points
On Sun, 17 Apr 2005, Jan T. Kim wrote:
On Sun, Apr 17, 2005 at 12:38:10PM +0100, Prof Brian Ripley wrote:
These are some points stimulated by reading about C history (and related in their implementation). 1) On some platforms
as.integer("0xA")
[1] 10
but not all (not on Solaris nor Windows). We do not define what is
allowed, and rely on the OS's implementation of strtod (yes, not
strtol).
It seems that glibc does allow hex: C99 mandates it but C89 seems not
to
allow it.
I think that was a mistake, and strtol should have been used. Then C89
does mandate the handling of hex constants and also octal ones. So
changing to strtol would change the meaning of as.integer("011").
I think interpretation of a leading "0" as a prefix indicating an octal representation should indeed be avoided. People not familiar to C will have a hard time understanding and getting used to this concept, and in addition, it happens way too often that numeric data are provided left- padded with zeros.
I agree with this: 011 should be 11, it should not be 9.
Proposal: we handle this ourselves and define what values are acceptable, namely for as.integer: [+|-][0-9]+ NA 0[x|X][0-9A-fa-f]+
It can be a somewhat mixed blessing if the string representation of numeric values contain information about their base, in the form of the 0x prefix in this case. The base argument (#3) of C's strtol function can be set to to a base explicitly or to 0, which gives the prefix-based "auto-selection" behaviour. On the R level, such a base argument (to as.integer) could be included and a default could be set.
A lot of this is internal, not at R level.
Personally, I would be equally happy with the default being 0 (auto-select) or 10. Considering the perhaps limited spread of familiarity with C's "0x" idiom, I somewhat favour a consistent and "stubborn" decimal behaviour (base defaults to 10), though.
Some people already rely on it, and those who don't know about it are unliekly to ever enter what they think is an illegal value, surely?
As long as we document it, I think the 0x prefix is fine. We should provide a way to use other bases on input and output. This could be through format specifiers, but it would be enough to have a pair of dedicated functions to do the conversions. Duncan Murdoch