Skip to content

Executing the same function on consecutive files

3 messages · Trying To learn again, John Kane, Dennis Murphy

#
This looks something like what you want.

http://r.789695.n4.nabble.com/Reading-in-a-series-of-files-using-a-for-loop-td906101.html
--- On Mon, 6/27/11, Trying To learn again <tryingtolearnagain at gmail.com> wrote:

            
#
Hi:

One approach:

(1) Put your files into a separate directory.
(2) Use list.files() to grab the individual file names.
(3) Write a function that takes a data frame as an argument and does
the necessary processing.
(4) Use lapply() or ldply/llply from the plyr package to recursively
run the function on each file in the list. lapply() and llply() will
return lists, ldply() would return a data frame. If you intend to use
ldply(), then the function in (3) needs to return a data frame.

Here's a small demo. I have five data sets in my starting directory
with variables x1, x2, y. The function reads in the data and returns
the output of a regression model; when lapply() is run on it, the
output of the five models is returned as a list. One can then cherry
pick output from the list of models.

files <- paste('dat', 1:5, '.csv', sep = '')
myfun <- function(d) {
    df <- read.csv(d, header = TRUE)
    lm(y ~ ., data = df)
  }
lout <- lapply(files, myfun)

library(plyr)
ldply(lout, function(x) coef(x))    # coefficients
ldply(lout, function(x) summary(x)$r.squared)   # R^2

One could also use
do.call(rbind, lapply(lout, function(x) coef(x))
do.call(rbind, lapply(lout, function(x) summary(x)$r.squared))

but ldply() has a somewhat simpler syntax.

Hopefully, you can adapt these steps to your problem.

Dennis

On Mon, Jun 27, 2011 at 3:01 PM, Trying To learn again
<tryingtolearnagain at gmail.com> wrote: