Skip to content

Reading large files in R

5 messages · Jean-Pierre Gattuso, Adaikalavan Ramasamy, Bert Gunter +1 more

#
Dear R-listers:

I am trying to work with a big (262 Mb) file but apparently reach a  
memory limit using R on a MacOSX as well as on a unix machine.

This is the script:

 > type=list(a=0,b=0,c=0)
 > tmp <- scan(file="coastal_gebco_sandS_blend.txt", what=type,  
sep="\t", quote="\"", dec=".", skip=1, na.strings="-99", nmax=13669628)
Read 13669627 records
 > gebco <- data.frame(tmp)
Error: cannot allocate vector of size 106793 Kb


Even tmp does not seem right:

 > summary(tmp)
Error: recursive default argument reference


Do you have any suggestion?

Thanks,
Jean-Pierre Gattuso
#
'read.table' is not the right tool for reading large matrices,
     especially those with many columns: it is designed to read _data
     frames_ which may have columns of very different classes. Use
     'scan' instead.

So I am not sure why you used 'scan', then converted it to a data frame.

1) Can provide an sample of the data that you are trying to read in.
2) How much memory does your machine has ?
3) Try reading in the first few lines using the nmax argument in scan.

Regards, Adai
On Mon, 2005-08-08 at 12:50 -0600, Jean-Pierre Gattuso wrote:
#
... and it is likely that even if you did have enough memory (several times
the size of the data are generally needed) it would take a very long time.

If you do have enough memory and the data are all of one type -- numeric
here -- you're better off treating it as a matrix rather than converting it
to a data frame.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
#
You can also use the RODBC package to hold the data in a database, say MySQL 
and only import it when you do the modelling, e.g.
In this case I have first saved the vandriver data in 'MySQL Test', but one 
can obviously write the data directly to the database. Since the data is not 
held in memory I find that I can do much larger computations than is 
otherwise possible. The downside is of course that computations take a bit 
longer.
Best wishes,

Andreas

=====================
Andreas D Hary
Email:    u08adh at hotmail.com
Mobile:   07906860987
Phone:   02076554940




----- Original Message ----- 
From: "Berton Gunter" <gunter.berton at gene.com>
To: <ramasamy at cancer.org.uk>; "'Jean-Pierre Gattuso'" <gattuso at obs-vlfr.fr>
Cc: <r-help at stat.math.ethz.ch>
Sent: Monday, August 08, 2005 8:35 PM
Subject: Re: [R] Reading large files in R
#
Brief correction: it should read
rather than
The latter statement would load the data into memory as usual.
Best wishes,

Andreas




----- Original Message ----- 
From: "Andreas Hary" <u08adh at hotmail.com>
To: "Berton Gunter" <gunter.berton at gene.com>; <ramasamy at cancer.org.uk>; 
"'Jean-Pierre Gattuso'" <gattuso at obs-vlfr.fr>
Cc: <r-help at stat.math.ethz.ch>
Sent: Monday, August 08, 2005 10:49 PM
Subject: Re: [R] Reading large files in R