Reading 64-bit integers
Draft patch attached. I haven't modified internal code before, so there may be a mistake in how I handle the mechanics, but hopefully this is a useful starting point. At any rate, the base package tests still work and it seems to function as intended:
x <- as.raw(c(0,0,0,1,0,0,0,0)) (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
[1] 0
(readBin(x,"integer",n=1,size=8,signed=F,endian="big",double.out=T))
[1] 4294967296
storage.mode(readBin(x,"integer",n=1,size=8,signed=F,endian="big",double.out=T))
[1] "double" The "double.out" argument is ignored unless "what" is integer. As far as I can tell there is no definition of unsigned long long akin to the one for long long (at the top of connections.c), so I have not handled the unsigned case for that type. The diff is against the current beta, but I can provide a SVN diff against the trunk if that is preferable. All the best, Jon
On 30 March 2011 02:49, Simon Urbanek <simon.urbanek at r-project.org> wrote:
On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:
On 29/03/2011 7:01 PM, Jon Clayden wrote:
Dear Simon, On 29 March 2011 22:40, Simon Urbanek<simon.urbanek at r-project.org> ?wrote:
Jon, On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
Dear Simon, Thank you for the response. On 29 March 2011 15:06, Simon Urbanek<simon.urbanek at r-project.org> ?wrote:
On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
Dear all, I see from some previous threads that support for 64-bit integers in R may be an aim for future versions, but in the meantime I'm wondering whether it is possible to read in integers of greater than 32 bits at all. Judging from ?readBin, it should be possible to read 8-byte integers to some degree, but it is clearly limited in practice by R's internally 32-bit integer type:
x<- as.raw(c(0,0,0,0,1,0,0,0)) (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
[1] 16777216
x<- as.raw(c(0,0,0,1,0,0,0,0)) (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
[1] 0 For values that fit into 32 bits it works fine, but for larger values it fails. (I'm a bit surprised by the zero - should the value not be NA if it is out of range?
No, it's not out of range - int is only 4 bytes so only 4 first bytes (respecting endianness order, hence LSB) are used.
The fact remains that I ask for the value of an 8-byte integer and don't get it.
I think you're misinterpreting the documentation: ? ? If ?size? is specified and not the natural size of the object, ? ? each element of the vector is coerced to an appropriate type ? ? before being written or as it is read. The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).
OK, but it still seems like there is a case for raising a warning. As it is there is no way to tell when reading an 8-byte integer from a file whether its value is really 0, or if it merely has 0 in its least-significant 4 bytes. If 99% of such stored numbers are below 2^31, one is going to need some extra logic to catch the other 1% where you (silently) get the wrong value. In essence, unless you're certain that you will never come across a number that actually uses the upper 4 bytes, you will always have to read it as two 4-byte numbers and check that the high-order one (which is endianness dependent, of course) is zero. A C-level sanity check seems more efficient and more helpful to me.
Seems to me that the S-PLUS solution (output="double") would be a lot more useful. ?I'd commit that if you write it; I don't think I'd commit the warning.
I was going to write some thing similar (idea = good, patch welcome ;)). My only worry is that the "output" argument is a bit misleading in that one could expect to use any combination of "input"/"output" which may be a maintenance nightmare. If I understand it correctly it's only a special case for integer input. I don't have S+ so can't say how they deal with that. Cheers, Simon
Pretending that it's really only four bytes because of the limits of R's integer type isn't all that helpful. Perhaps a warning should be put out if the cast will affect the value of the result? It looks like the relevant lines in src/main/connections.c are 3689-3697 in the current alpha: #if SIZEOF_LONG == 8 ? ? ? ? ? ? ? ? ? case sizeof(long): ? ? ? ? ? ? ? ? ? ? ? INTEGER(ans)[i] = (int)*((long *)buf); ? ? ? ? ? ? ? ? ? ? ? break; #elif SIZEOF_LONG_LONG == 8 ? ? ? ? ? ? ? ? ? case sizeof(_lli_t): ? ? ? ? ? ? ? ? ? ? ? INTEGER(ans)[i] = (int)*((_lli_t *)buf); ? ? ? ? ? ? ? ? ? ? ? break; #endif
) The value can be represented as a double, though:
4294967296
[1] 4294967296 I wouldn't expect readBin() to return a double if an integer was requested, but is there any way to get the correct value out of it?
Trivially (for your unsigned big-endian case): y<- readBin(x, "integer", n=length(x)/4L, endian="big") y<- ifelse(y< ?0, 2^32 + y, y) i<- seq(1,length(y),2) y<- y[i] * 2^32 + y[i + 1L]
Thanks for the code, but I'm not sure I would call that trivial, especially if one needs to cater for little endian and signed cases as well!
I was saying for your case and it's trivial as in read as integers, convert to double precision and add.
This is what I meant by reconstructing the number manually...
You didn't say so - you were talking about reconstructing it from a raw vector which seems a lot more painful since you can't compute with enough precision on raw vectors.
True - I should have been more specific. Sorry. Jon
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel