RFC: hexadecimal constants and decimal points
On Sun, Apr 17, 2005 at 12:38:10PM +0100, Prof Brian Ripley wrote:
These are some points stimulated by reading about C history (and related in their implementation). 1) On some platforms
as.integer("0xA")
[1] 10
but not all (not on Solaris nor Windows). We do not define what is
allowed, and rely on the OS's implementation of strtod (yes, not strtol).
It seems that glibc does allow hex: C99 mandates it but C89 seems not to
allow it.
I think that was a mistake, and strtol should have been used. Then C89
does mandate the handling of hex constants and also octal ones. So
changing to strtol would change the meaning of as.integer("011").
I think interpretation of a leading "0" as a prefix indicating an octal representation should indeed be avoided. People not familiar to C will have a hard time understanding and getting used to this concept, and in addition, it happens way too often that numeric data are provided left- padded with zeros.
Proposal: we handle this ourselves and define what values are acceptable, namely for as.integer: [+|-][0-9]+ NA 0[x|X][0-9A-fa-f]+
It can be a somewhat mixed blessing if the string representation of numeric values contain information about their base, in the form of the 0x prefix in this case. The base argument (#3) of C's strtol function can be set to to a base explicitly or to 0, which gives the prefix-based "auto-selection" behaviour. On the R level, such a base argument (to as.integer) could be included and a default could be set. Personally, I would be equally happy with the default being 0 (auto-select) or 10. Considering the perhaps limited spread of familiarity with C's "0x" idiom, I somewhat favour a consistent and "stubborn" decimal behaviour (base defaults to 10), though. Best regards, Jan
+- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----*