read.table() and precision?

7 messages · Wojciech Gryc, Peter Dalgaard, Knut Krueger +2 more

Original

1

7

Mon, Dec 17, 2007 9:34 AM #

Hi,

I'm currently working with data that has values as large as 99,000,000
but is accurate to 6 decimal places. Unfortunately, when I load the
data using read.table(), it rounds everything to the nearest integer.
Is there any way for me to preserve the information or work with
arbitrarily large floating point numbers?

Thank you,
Wojciech

Five Minutes to Midnight:
Youth on human rights and current affairs
http://www.fiveminutestomidnight.org/

Mon, Dec 17, 2007 10:09 AM #

Wojciech Gryc wrote:

Are you sure?

To my knowledge, read.table doesn't round anything, except when running
out of bits to store the values in, and 13 decimal places should fit in
ordinary double precision variables.

Printing the result is another matter. Try playing with the
print(mydata, digits=15) and the like.

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Knut Krueger

Mon, Dec 17, 2007 10:10 AM #

Did you set the  the character used in the file for decimal points?

dec = "." or  dec = ","


Knut

Moshe Olshansky

Mon, Dec 17, 2007 8:06 PM #

If x is the result of your read.table, it is a double
precision number (matrix, data.frame, etc.), but by
default only up to 7 decimal digits of x are printed,
so you do not see the rest of x. 
Try for example
options(digits=15) 
and see how your x look then.

--- Wojciech Gryc <wojciech at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.

Moshe Olshansky

Mon, Dec 17, 2007 11:02 PM #

Dear List,

Following the below question I have a question of my
own:
Suppose that I have large matrices which are produced
sequentially and must be used sequentially in the
reverse order. I do not have enough memory to store
them and so I would like to write them to disk and
then read them. This raises two questions:
1) what is the fastest (and the most economic
space-wise) way to do this?
2) functions like write, write.table, etc. write the
data the way it is printed and this may result in a
loss of accuracy. Is there any way to prevent this,
except for setting the "digits" option to a higher
value or using format prior to writing the data? Is it
possible to write binary files (similar to Fortran)?

Any suggestion will be greatly appreciated.

--- Wojciech Gryc <wojciech at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.

Brian Ripley

Tue, Dec 18, 2007 2:20 AM #

On Mon, 17 Dec 2007, Moshe Olshansky wrote:

Using save/load is the simplest.  Don't worry about finding better 
solutions until you know those are not good enough.  (serialize / 
unserialize is another interface to the same underlying idea.)

Do please read the help before making false claims. ?write.table says

      Real and complex numbers are written to the maximal possible
      precision.

OTOH, ?write says it is a wrapper for cat, whose help says

      'cat' converts numeric/complex elements in the same way as 'print'
      (and not in the same way as 'as.character' which is used by the S
      equivalent), so 'options' '"digits"' and '"scipen"' are relevant.
      However, it uses the minimum field width necessary for each
      element, rather than the same field width for all elements.

so this hints as.character() might be a useful preprocessor.

See ?writeBin.  save/load by default write binary files, but use of 
writeBin can be faster (and less flexible).

Somehow you have missed a great deal of information about R I/O.
Try help.start() and reading the sections the search engine shows you 
that look relevant.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Moshe Olshansky

Tue, Dec 18, 2007 8:45 PM #

Thank you for your response!

'write.table' writes up to 15 decimal digits which is
not the machine (double) precision but not far from
that - sorry for the misleading comments!

After all I found a way to do what I needed without
using disk or much memory and doing only twice as much
work as I could with unlimited memory, so I will stick
to this approach.

--- Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:

On Mon, 17 Dec 2007, Moshe Olshansky wrote:

Dear List,

Following the below question I have a question of

my

own:
Suppose that I have large matrices which are

produced

sequentially and must be used sequentially in the
reverse order. I do not have enough memory to

store

them and so I would like to write them to disk and
then read them. This raises two questions:
1) what is the fastest (and the most economic
space-wise) way to do this?

Using save/load is the simplest.  Don't worry about
finding better 
solutions until you know those are not good enough. 
(serialize / 
unserialize is another interface to the same
underlying idea.)

2) functions like write, write.table, etc. write

the

data the way it is printed and this may result in

loss of accuracy. Is there any way to prevent

this,

except for setting the "digits" option to a higher
value or using format prior to writing the data?

Do please read the help before making false claims.
?write.table says

      Real and complex numbers are written to the
maximal possible
      precision.

OTOH, ?write says it is a wrapper for cat, whose
help says

      'cat' converts numeric/complex elements in the
same way as 'print'
      (and not in the same way as 'as.character'
which is used by the S
      equivalent), so 'options' '"digits"' and
'"scipen"' are relevant.
      However, it uses the minimum field width
necessary for each
      element, rather than the same field width for
all elements.

so this hints as.character() might be a useful
preprocessor.

Is it possible to write binary files (similar to

Fortran)?

See ?writeBin.  save/load by default write binary
files, but use of 
writeBin can be faster (and less flexible).

Any suggestion will be greatly appreciated.

Somehow you have missed a great deal of information
about R I/O.
Try help.start() and reading the sections the search
engine shows you 
that look relevant.


-- 
Brian D. Ripley,                 
ripley at stats.ox.ac.uk
Professor of Applied Statistics, 
http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865
272861 (self)
1 South Parks Road,                     +44 1865
272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865
272595