Skip to content

storing large data.frame's

4 messages · Meinhard Ploner, Joerg Maeder, Agustin Lobo +1 more

#
I am new on R, so I have a maybe naive question:

if I have many large data.frames and I use only one or two per session, 
what's the best way?
If all are stored in the actual .Rdata, the system gets slow.
On the other hand, I wouldn't like to make a separate package for the 
data.

Should I save it with save() and then remove it with rm() ?
Could I reload it then?

Thanks for suggestions
Meinhard Ploner



ps my system: R 1.4.1 on mac/darwin.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Hello Meinhard
that is possible with load(filename). It will be saved under the same
name, as when you used save, actually you can save a whole list of
variables with save and load again.

  
    
#
Save each dataset in its own rda file:

save(mydataset1R,file="mydataset1R.rda")

Then check before deleting:

mydataset1R.backup <- mydataset1R
rm(mydataset1R)
attach("mydataset1R.rda")
this will leave mydataset1R in pos=2
(check with search() and then ls(2) )

Compare mydataset1R and mydataset1R.backup
(for example, compare means by cols etc)

Then rm(mydataset1R.backup)

Do the same for the rest of your large datasets.

Then q(), saving your workspace. Now, .RData DOES NOT
include your large datasets. You might want keep
each mydataset*R.rda in a separate directory.

Then start R from the appropriate directory.
Your workspace will not have any of the
mydataset*R, as they are no longer in .RData.
Use attach("mydataset1R.rda") to bring
the required dataset to pos=2.

A good advantage of keeping your large datasets
in different rda files is that if you ever have
a problem in R and .RData gets corrupted,
your large files are safe. 

And a good advanntage,
but also a caution, of keeping the large R
objects in pos!=1 is
that save.image() WILL NOT save the 
large datasets (you must use
save() specifying the environment, 
check help(save))

Agus





Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es
On Fri, 22 Feb 2002, Meinhard Ploner wrote:

            
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Fri, Feb 22, 2002 at 11:23:29AM +0100, Meinhard Ploner wrote:
Take a look at the package "g.data" at your local CRAN:

g.data: Delayed-Data Packages 

     Create and maintain delayed-data packages (DDP's). Data
     stored in a DDP are available on demand, but do not take up
     memory until requested. You attach a DDP with
     g.data.attach(), then read from it and assign to it in a manner
     similar to S-Plus, except that you must run g.data.save() to
     actually commit to disk. 
     Version:
                    1.2
     Date:
                    2001-11-30
     Author:
                    David Brahm <brahm at alum.mit.edu>