Skip to content

using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function

3 messages · Thomas Pujol, Emmanuel Charpentier, Gabor Grothendieck

#
Thomas Pujol a ?crit :
Beware : what follows is the advice of someone used to use RDBMS and SQL
to work with data ; as anyone should know, everything is a nail to a man
with a hammer. Caveat emptor...

Unless I misunderstand you, you are trying to treat piecewise a large
dataset made of a large number of reasonably-sized independent chunks.

What you're trying to do seems to me a bit reinventing SAS macro
language. What's the point ?

IMNSHO, "large" datasets that are used only piecewise are much better
handled in a real database (RDBMS), queried at runtime via, for example,
Brian Ripley's RODBC.

In your example, I'd create a table births with all your data + the
relevant year. Out of the top of my mind :

# Do that ONCE in the lifetime of your data : a RDBMS is probably more
# apt than R dataframes for this kind of management

library(RODBC)
channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS)

sqlSave(channel, tablename="Births",
        rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))),
                    births.1952),
              cbind(data.frame(Year=rep(1953,nrow(births.1953))),
                    births.1953),
# ... ^W^Y ad nauseam ...
))

rm(births.1951, births.1952, ...) # get back breathing space

Beware : certain data types may be tricky to save ! I got bitten by
Dates recently... See RODBC documentation, your DBMS documentation and
the "R Data Import/Export guide"...

At analysis time, you may use the result of the relevant query exactly
as one of your dataframes. instead of :
foo(... data=birth.1952, ...)
type :
foo(... data=sqlQuery(channel,"select * from \"Births\" where
\"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS...

Furthermore, the variable "Year" bears your "d" information. Problem
(dis)solved.

You may loop (or even sapply()...) at will on d :
for(year in 1952:1978) {
  query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year)
  foo(... data=sqlQuery(channel,query), ...)
  ...
}

If you already use a DBMS with some connection to R (via RODBC or
otherwise), use that. If not, sqlite is a very lightweight library that
enables you to use a (very considerable) subset of SQL92 to manipulate
your data.

I understand that some people of this list have undertaken the creation
of a sqlite-based package dedicated to this kind of large data management.

HTH,

					Emmanuel Charpentier
#
Use the same names (births, temp, ...) in each Rdata file and then load
each file into its own environment or proto object:

	library(proto); x1951 <- proto() # or x1951 <- new.env()
	load("1951.rda", envir = x1951)

Then pass the environment or proto object to each of your functions:

	f <- function(x) x$difference <- x$births - x$temp
	f(x1951)

The above completely avoids renaming variables and instead treats each
year as an object. If you use proto objects the home page
is: http://r-proto.googlecode.com
On Dec 6, 2007 12:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote: