An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071206/af09443f/attachment.pl
using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
3 messages · Thomas Pujol, Emmanuel Charpentier, Gabor Grothendieck
Thomas Pujol a ?crit :
R-help users, Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data. I currently accomplish this by using a custom function that uses: "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ). Is there a better/preferred/more flexible way to do this?
Beware : what follows is the advice of someone used to use RDBMS and SQL
to work with data ; as anyone should know, everything is a nail to a man
with a hammer. Caveat emptor...
Unless I misunderstand you, you are trying to treat piecewise a large
dataset made of a large number of reasonably-sized independent chunks.
What you're trying to do seems to me a bit reinventing SAS macro
language. What's the point ?
IMNSHO, "large" datasets that are used only piecewise are much better
handled in a real database (RDBMS), queried at runtime via, for example,
Brian Ripley's RODBC.
In your example, I'd create a table births with all your data + the
relevant year. Out of the top of my mind :
# Do that ONCE in the lifetime of your data : a RDBMS is probably more
# apt than R dataframes for this kind of management
library(RODBC)
channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS)
sqlSave(channel, tablename="Births",
rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))),
births.1952),
cbind(data.frame(Year=rep(1953,nrow(births.1953))),
births.1953),
# ... ^W^Y ad nauseam ...
))
rm(births.1951, births.1952, ...) # get back breathing space
Beware : certain data types may be tricky to save ! I got bitten by
Dates recently... See RODBC documentation, your DBMS documentation and
the "R Data Import/Export guide"...
At analysis time, you may use the result of the relevant query exactly
as one of your dataframes. instead of :
foo(... data=birth.1952, ...)
type :
foo(... data=sqlQuery(channel,"select * from \"Births\" where
\"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS...
Furthermore, the variable "Year" bears your "d" information. Problem
(dis)solved.
You may loop (or even sapply()...) at will on d :
for(year in 1952:1978) {
query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year)
foo(... data=sqlQuery(channel,query), ...)
...
}
If you already use a DBMS with some connection to R (via RODBC or
otherwise), use that. If not, sqlite is a very lightweight library that
enables you to use a (very considerable) subset of SQL92 to manipulate
your data.
I understand that some people of this list have undertaken the creation
of a sqlite-based package dedicated to this kind of large data management.
HTH,
Emmanuel Charpentier
Use the same names (births, temp, ...) in each Rdata file and then load
each file into its own environment or proto object:
library(proto); x1951 <- proto() # or x1951 <- new.env()
load("1951.rda", envir = x1951)
Then pass the environment or proto object to each of your functions:
f <- function(x) x$difference <- x$births - x$temp
f(x1951)
The above completely avoids renaming variables and instead treats each
year as an object. If you use proto objects the home page
is: http://r-proto.googlecode.com
On Dec 6, 2007 12:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:
R-help users,
Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did.
I have data for many "samples". (e.g. 1950, 1951, 1952, etc.)
For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.)
(Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.)
I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data.
I currently accomplish this by using a custom function that uses:
"eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ).
Is there a better/preferred/more flexible way to do this?
One issue/obstacle that I have encountered: Some of the custom functions I use need to take as input the value of "d" in the loop below.
(Please see the sample function "fn.mn.d" below.)
#creates sample data
temp.1951 <- c(11,13,15)
births.1951 <- c(123, 156, 178)
temp.1952 <- c(21,23,25)
births.1952 <- c(223, 256, 278)
#######################
#function that looks for a a pattern "pat.i" within "x", and replaces it with "rep"
recurse <- function(x, pat.i,rep.i) {
f <- function(x,pat,rep) if (mode(x) == "character") gsub(pat, rep, x) else x
if (length(x) == 0) return(x)
if (is.list(x)) for(i in seq_along(x)) x[[i]] <- recurse(x[[i]], pat.i,rep.i)
else x <- f(x,pat.i,rep.i)
x
#f <- function(x) if (mode(x) == "character") gsub("a", "green", x) else x
}# end recurse end
#######################
#######################
#function that processes code submitted as "text.i" for each date in "dates.i"
fn.dateloop <- function(text.i, dates.i ) {
for(d in 1: length(dates.i) ) {
tempdate <- dates.i[d]
text.i2 <- recurse(text.i, pat.i='#', rep.i=tempdate)
temp0=eval(parse(t=text.i2))
tempname <- paste(names(temp0)[1], tempdate, sep='.')
save(list='temp0', file = tempname)
} # next d
} # end fn.dateloop
#######################
#####################
#a sample custom function that I want to run on each sample of data
fn.mn <- function(x, y) {
res = x - y
names(res) = 'mn'
res
}
#####################
#####################
#example of function that takes d as input...
#I have not been able to get this to work with the custom function "fn.dateloop" above
#I request assistance in learning how to accomplish this
fn.mn.d <- function(x, y, d) {x[d] - y[d]}
#####################
#####################
setwd('c:/') #specifies location where sample data will be saved
getwd() #checks location
fn.mn(x=temp.1951, y=births.1951)
fn.mn(x=temp.1952, y=births.1952)
#
fn.dateloop(text.i = "fn.mn(x=get('temp.#'), y=get('births.#') )" , dates.i=c('1951','1952') )
get(load('mn.1951'))
get(load('mn.1952'))
---------------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.