Prev 39605 / 63424 Next

Reading 64-bit integers

Jon Clayden

Wed, Mar 30, 2011 9:45 AM

Draft patch attached. I haven't modified internal code before, so
there may be a mistake in how I handle the mechanics, but hopefully
this is a useful starting point. At any rate, the base package tests
still work and it seems to function as intended:

[1] 0

[1] 4294967296

[1] "double"

The "double.out" argument is ignored unless "what" is integer. As far
as I can tell there is no definition of unsigned long long akin to the
one for long long (at the top of connections.c), so I have not handled
the unsigned case for that type.

The diff is against the current beta, but I can provide a SVN diff
against the trunk if that is preferable.

All the best,
Jon

On 30 March 2011 02:49, Simon Urbanek <simon.urbanek at r-project.org> wrote:

On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:

On 29/03/2011 7:01 PM, Jon Clayden wrote:

Dear Simon,

On 29 March 2011 22:40, Simon Urbanek<simon.urbanek at r-project.org> ?wrote:

Jon,

On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:

Dear Simon,

Thank you for the response.

On 29 March 2011 15:06, Simon Urbanek<simon.urbanek at r-project.org> ?wrote:

On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:

Dear all,

I see from some previous threads that support for 64-bit integers in R
may be an aim for future versions, but in the meantime I'm wondering
whether it is possible to read in integers of greater than 32 bits at
all. Judging from ?readBin, it should be possible to read 8-byte
integers to some degree, but it is clearly limited in practice by R's
internally 32-bit integer type:

x<- as.raw(c(0,0,0,0,1,0,0,0))
(readBin(x,"integer",n=1,size=8,signed=F,endian="big"))

[1] 16777216

x<- as.raw(c(0,0,0,1,0,0,0,0))
(readBin(x,"integer",n=1,size=8,signed=F,endian="big"))

[1] 0

For values that fit into 32 bits it works fine, but for larger values
it fails. (I'm a bit surprised by the zero - should the value not be
NA if it is out of range?

No, it's not out of range - int is only 4 bytes so only 4 first bytes (respecting endianness order, hence LSB) are used.

The fact remains that I ask for the value of an 8-byte integer and
don't get it.

I think you're misinterpreting the documentation:

? ? If ?size? is specified and not the natural size of the object,
? ? each element of the vector is coerced to an appropriate type
? ? before being written or as it is read.

The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).

OK, but it still seems like there is a case for raising a warning. As
it is there is no way to tell when reading an 8-byte integer from a
file whether its value is really 0, or if it merely has 0 in its
least-significant 4 bytes. If 99% of such stored numbers are below
2^31, one is going to need some extra logic to catch the other 1%
where you (silently) get the wrong value. In essence, unless you're
certain that you will never come across a number that actually uses
the upper 4 bytes, you will always have to read it as two 4-byte
numbers and check that the high-order one (which is endianness
dependent, of course) is zero. A C-level sanity check seems more
efficient and more helpful to me.

Seems to me that the S-PLUS solution (output="double") would be a lot more useful. ?I'd commit that if you write it; I don't think I'd commit the warning.

I was going to write some thing similar (idea = good, patch welcome ;)). My only worry is that the "output" argument is a bit misleading in that one could expect to use any combination of "input"/"output" which may be a maintenance nightmare. If I understand it correctly it's only a special case for integer input. I don't have S+ so can't say how they deal with that.

Cheers,
Simon

Pretending that it's really only four bytes because of
the limits of R's integer type isn't all that helpful. Perhaps a
warning should be put out if the cast will affect the value of the
result? It looks like the relevant lines in src/main/connections.c are
3689-3697 in the current alpha:

#if SIZEOF_LONG == 8
? ? ? ? ? ? ? ? ? case sizeof(long):
? ? ? ? ? ? ? ? ? ? ? INTEGER(ans)[i] = (int)*((long *)buf);
? ? ? ? ? ? ? ? ? ? ? break;
#elif SIZEOF_LONG_LONG == 8
? ? ? ? ? ? ? ? ? case sizeof(_lli_t):
? ? ? ? ? ? ? ? ? ? ? INTEGER(ans)[i] = (int)*((_lli_t *)buf);
? ? ? ? ? ? ? ? ? ? ? break;
#endif

) The value can be represented as a double,
though:

4294967296

[1] 4294967296

I wouldn't expect readBin() to return a double if an integer was
requested, but is there any way to get the correct value out of it?

Trivially (for your unsigned big-endian case):

y<- readBin(x, "integer", n=length(x)/4L, endian="big")
y<- ifelse(y< ?0, 2^32 + y, y)
i<- seq(1,length(y),2)
y<- y[i] * 2^32 + y[i + 1L]

Thanks for the code, but I'm not sure I would call that trivial,
especially if one needs to cater for little endian and signed cases as
well!

I was saying for your case and it's trivial as in read as integers, convert to double precision and add.

This is what I meant by reconstructing the number manually...

You didn't say so - you were talking about reconstructing it from a raw vector which seems a lot more painful since you can't compute with enough precision on raw vectors.

True - I should have been more specific. Sorry.

Jon

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Thread (14 messages)

Jon Clayden Reading 64-bit integers Mar 29 Simon Urbanek Reading 64-bit integers Mar 29 Jon Clayden Reading 64-bit integers Mar 29 William Dunlap Reading 64-bit integers Mar 29 Simon Urbanek Reading 64-bit integers Mar 29 Jon Clayden Reading 64-bit integers Mar 29 Duncan Murdoch Reading 64-bit integers Mar 29 Simon Urbanek Reading 64-bit integers Mar 29 Jon Clayden Reading 64-bit integers Mar 30 Jon Clayden Reading 64-bit integers Mar 30 William Dunlap Reading 64-bit integers Mar 30 Simon Urbanek Reading 64-bit integers Mar 30 Henrik Bengtsson Reading 64-bit integers Mar 30 Henrik Bengtsson Reading 64-bit integers Mar 30