Reading 64-bit integers
Bill, thanks. I like that idea of the output parameter better, especially if we ever add different scalar vector types. Admittedly, what=integer() is the most useful case. What I was worried about is things like what=double(), output=integer() which could be legal, but are more conveniently dealt with via as.integer(readBin()) instead. I won't have more time today, but I'll have a look tomorrow. Thanks, Simon
On Mar 30, 2011, at 1:38 PM, William Dunlap wrote:
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Simon Urbanek Sent: Tuesday, March 29, 2011 6:49 PM To: Duncan Murdoch Cc: r-devel at r-project.org Subject: Re: [Rd] Reading 64-bit integers On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:
On 29/03/2011 7:01 PM, Jon Clayden wrote:
Dear Simon, On 29 March 2011 22:40, Simon
Urbanek<simon.urbanek at r-project.org> wrote:
Jon, On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
Dear Simon, Thank you for the response. On 29 March 2011 15:06, Simon
Urbanek<simon.urbanek at r-project.org> wrote:
On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
Dear all, I see from some previous threads that support for
64-bit integers in R
may be an aim for future versions, but in the meantime
I'm wondering
whether it is possible to read in integers of greater
than 32 bits at
all. Judging from ?readBin, it should be possible to
read 8-byte
integers to some degree, but it is clearly limited in
practice by R's
internally 32-bit integer type:
x<- as.raw(c(0,0,0,0,1,0,0,0)) (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
[1] 16777216
x<- as.raw(c(0,0,0,1,0,0,0,0)) (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
[1] 0 For values that fit into 32 bits it works fine, but
for larger values
it fails. (I'm a bit surprised by the zero - should
the value not be
NA if it is out of range?
No, it's not out of range - int is only 4 bytes so only
4 first bytes (respecting endianness order, hence LSB) are used.
The fact remains that I ask for the value of an 8-byte
integer and
don't get it.
I think you're misinterpreting the documentation: If 'size' is specified and not the natural size of the object, each element of the vector is coerced to an appropriate type before being written or as it is read. The "integer" object type is defined as signed 32-bit in
R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).
OK, but it still seems like there is a case for raising a
warning. As
it is there is no way to tell when reading an 8-byte integer from a file whether its value is really 0, or if it merely has 0 in its least-significant 4 bytes. If 99% of such stored numbers are below 2^31, one is going to need some extra logic to catch the other 1% where you (silently) get the wrong value. In essence, unless you're certain that you will never come across a number that actually uses the upper 4 bytes, you will always have to read it as two 4-byte numbers and check that the high-order one (which is endianness dependent, of course) is zero. A C-level sanity check seems more efficient and more helpful to me.
Seems to me that the S-PLUS solution (output="double")
would be a lot more useful. I'd commit that if you write it; I don't think I'd commit the warning.
I was going to write some thing similar (idea = good, patch welcome ;)). My only worry is that the "output" argument is a bit misleading in that one could expect to use any combination of "input"/"output" which may be a maintenance nightmare. If I understand it correctly it's only a special case for integer input. I don't have S+ so can't say how they deal with that.
In S+'s readBin the output argument can be only double() or single() when what is double() or single() (S+ still has a real single precision storage mode) and can be any numeric type or logical when what is integer(). The output=double() seemed like the only useful case. It does not warn when precision is lost in the 8-byte integer to double conversion. Perhaps it should. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
Cheers, Simon
Pretending that it's really only four bytes because of the limits of R's integer type isn't all that helpful. Perhaps a warning should be put out if the cast will affect the
value of the
result? It looks like the relevant lines in
src/main/connections.c are
3689-3697 in the current alpha:
#if SIZEOF_LONG == 8
case sizeof(long):
INTEGER(ans)[i] = (int)*((long *)buf);
break;
#elif SIZEOF_LONG_LONG == 8
case sizeof(_lli_t):
INTEGER(ans)[i] = (int)*((_lli_t *)buf);
break;
#endif
) The value can be represented as a double, though:
4294967296
[1] 4294967296 I wouldn't expect readBin() to return a double if an
integer was
requested, but is there any way to get the correct
value out of it?
Trivially (for your unsigned big-endian case): y<- readBin(x, "integer", n=length(x)/4L, endian="big") y<- ifelse(y< 0, 2^32 + y, y) i<- seq(1,length(y),2) y<- y[i] * 2^32 + y[i + 1L]
Thanks for the code, but I'm not sure I would call that trivial, especially if one needs to cater for little endian and
signed cases as
well!
I was saying for your case and it's trivial as in read as
integers, convert to double precision and add.
This is what I meant by reconstructing the number manually...
You didn't say so - you were talking about reconstructing
it from a raw vector which seems a lot more painful since you can't compute with enough precision on raw vectors.
True - I should have been more specific. Sorry. Jon
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel