Skip to content

Reading and writing to S-like databases

3 messages · David Brahm, Jason W. Martinez, Agustin Lobo

#
Hi,

   I asked this question 2 years ago, and would like to know if the answer has
changed.

   In S-Plus, I build databases of many large objects.  In any given analysis,
I only need a few of those objects, but attach'ing the whole database is fine
since objects are only read as needed.  How can I do the same thing in R,
without reading the entire database?

   One possibility is to treat the database as a package, devoid of code but
containing many .RData files under /data, then load() each object I'll need.
Perhaps autoload() can be used to avoid having to anticipate which objects I'll
need?

   Another is to use dput and dget.  Again I need to know ahead of time which
objects I'll want.
On July 20, 1999, Ross Ihaka [mailto:ihaka at stat.auckland.ac.nz] wrote:
I'm not sure where that ended up -- could you clarify, Ross?  Thanks!

			-- David Brahm (a215020 at agate.fmr.com)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Hi,

Probably not what you wanted to hear, but.... It sounds like you need to
use a relational database management system--assuming that these
'objects' are data frames. If that is the case, then I would migrate to
a RDBMS and establish ODBC (or RpgSQL, RmySQL, etc....) connections with
splus or R.

You can then select particular variables with SQL statements, thereby
avoiding the problem of reading in the entire database.

The downside to this approach is that 'someone' must be capable of
managing the DBMS and to be there to help others. If the DB's are large
and being used by many people, then it might be worth the time and
effort.

Jason





--
Jason Martinez
Sociology Graduate Student
University of California, Riverside
David Brahm wrote:
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Probably the suggestion by Jason ("using a relational database management
system") would be the best although complicated. As alternatives:

1. Save each object as a separate binary file. Create
another object (i.e. a 2 col matrix) that indexes each object
to its file. Attach the index object and, in your function,
attach only the required object. Unfortunately, this implies
that you must know what part (file) of the whole database you
need for a given operation (i.e., that if you need 
individual 3456 you know in which file you have the data for it).

2. You can use delay(). I'm almost done with a short document
on "Using R with large objects", for which I've got interesting
input from the list. In particular, I got the following message
by  Ray Brownrigg (Ray.Brownrigg at mcs.vuw.ac.nz):

""
A.  To set up an object so that it is available at all times, but only
loaded into memory when first referenced, consider the following:

test.x <- delay({attach(system.file("data", "test.rda", pkg="test"));
test.x})

The object test.x has been created and saved as a .rda file using
save(test.x, file="test.rda"), and the resulting file test.rda has been
stored in the data directory of the (installed) package test.  Normally
the command above will be executed as part of loading the package test,
i.e. when library(test) is entered by the user at the R prompt.  Further,
because the object test.x is part of package test, it is not saved as
part of a new .RData when an R session is terminated, (as long as
nothing new is assigned to test.x during the session).
""

You could trick R with this delayed attachment of all the
objects of your database, but you would actually only attach
a given one if your processing really use it.

3. As I point in the document "Getting your styff organized in R",
I've not found any way to list the objects within a binary R file, 
nor to select particular objects from the binary file and attach
only the selected ones (which would be the best solution
in so many cases). I wonder if future R versions could consider
this feature.


Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es
On Thu, 27 Sep 2001, David Brahm wrote:

            
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._