Reading an "unsigned long long" using R readBin()

6 messages · Sean Davis, Simon Urbanek, Brian Ripley +2 more

Original

1

6

Sean Davis

Thu, May 29, 2008 11:41 AM #

Sorry for the simple question, but I am trying to read an "unsigned
long long" using the R readBin() function.  Can someone point me in
the right direction, or am I better off using C for such things?  The
file that I am reading will have been produced on the same machine
that is doing the reading.

Thanks,
Sean

Fri, May 30, 2008 10:48 AM #

On May 29, 2008, at 2:41 PM, Sean Davis wrote:

R has no data type that can hold 64-bit integers (long long), so there  
is no (lossless) way to read such a field in R.
If you know the endianness of the machine you can read two integers  
and combine the result as a float to get an approximate value.  
Otherwise C is your friend (and easy to call from R) for 64-bit  
calculations, bitwise operations and other tricks that are hard to do  
in R.

Cheers,
Simon

Brian Ripley

Fri, May 30, 2008 10:55 AM #

Well, R has no unsigned quantities, so ultimately you can't actually do 
this.  But using what="int" and an appropriate 'size' (likely to be 8)
shold read the numbers, wrapping around very large ones to be negative.
(The usual trick of storing integers in numeric will lose accuracy, but 
might be better than nothing.)

On Thu, 29 May 2008, Sean Davis wrote:

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Fri, May 30, 2008 11:07 AM #

On 5/30/2008 1:55 PM, Prof Brian Ripley wrote:

I think reading size 8 integers on 32 bit Windows returns signed 32 bit 
integers, with values outside that range losing the high order bits, not 
just accuracy.  At least that's what I see when I write the numbers 1:10 
out as 4 byte integers, and read them as 8 byte integers:  I get 1 3 5 7 9.

Duncan Murdoch

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Brian Ripley

Fri, May 30, 2008 11:09 AM #

On Fri, 30 May 2008, Duncan Murdoch wrote:

Yes, that's true for even larger ones.

So to clarify: up to 2^31-1 should work, thereafter you will get the lower 
32 bits and hence possibly a signed number.

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Bill Dunlap

Fri, May 30, 2008 1:20 PM #

On Fri, 30 May 2008, Prof Brian Ripley wrote:

When we wrote a version of readBin() for Splus 8.0 we added an
extra argument, output=, that specifies the type of S object
to put the result into.  The what= argument says what sort
of data is in the input file and by default output=what.
output="double" can be useful in this case, as a double can
store a 53 bit signed or unsigned integer without loss of
precision.  If the integer is bigger than 2^53-1, the double
stores its most significant 53 bits, which may be better
than truncating the thing.

E.g., I wrote a C program to write some unsigned long longs to
a file:
    #include <stdio.h>
    int main(int argc, char *argv[])
    {
	    unsigned long long data[7], one = 1ULL ;
	    data[0] = one ;
	    data[1] = (one<<31) - 1 ;
	    data[2] = (one<<31) + 1 ;
	    data[3] = (one<<32) - 1 ;
	    data[4] = (one<<32) + 1 ;
	    data[5] = (one<<52) + 1 ;
	    data[6] = (one<<54) + 1 ;
 	    (void)fwrite((void *)data, sizeof(data[0]), sizeof(data)/sizeof(data[0]), stdout) ;
	    return 0 ;
    }

od shows what it writes, as unsigned, signed, and hex
8 byte integers:
    % ./a.out|od --format u8
    0000000                    1           2147483647
    0000020           2147483649           4294967295
    0000040           4294967297     4503599627370497
    0000060    18014398509481985
    0000070
    % ./a.out | od --format d8
    0000000                    1           2147483647
    0000020           2147483649           4294967295
    0000040           4294967297     4503599627370497
    0000060    18014398509481985
    0000070
    % ./a.out | od --format x8
    0000000 0000000000000001 000000007fffffff
    0000020 0000000080000001 00000000ffffffff
    0000040 0000000100000001 0010000000000001
    0000060 0040000000000001
    0000070

and in 32-bit Splus I can read it with:
    > z<-readBin(pipe("./a.out", open="br"), what="integer", n=7,
              size=8, signed=FALSE, output="double")
    > print(z, digits=16)
    [1]                 1        2147483647        2147483649        4294967295
    [5]        4294967297  4503599627370497 18014398509481984
Note that it loses precision where z[7]>2^53.

Without the output="double" then the numbers > 2^32 would be
truncated and the signs would be wrong on ones between 2^31
anbd 2^32:
    > readBin(pipe("./a.out", open="br"), what="integer", n=7,
              size=8, signed=FALSE)
    [1]           1  2147483647 -2147483647          -1           1           1
    [7]           1
(That one gives the same result in R and Splus.)

What do folks think about having this option in R?

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."