Skip to content

Writing a single output file

5 messages · Amy Milano, jim holtman, Hadley Wickham +2 more

#
This should get you close:
[1] "file1.csv" "file2.csv" "file3.csv" "file4.csv"
+     .data <- read.table(.name, header = TRUE, as.is = TRUE)
+     # add file name to the data
+     .data$file <- .name
+     .data
+ }))
date yield_rate      file
1 12/23/2010       5.25 file1.csv
2 12/22/2010       5.19 file1.csv
3 12/23/2010       5.25 file2.csv
4 12/22/2010       5.19 file2.csv
5 12/23/2010       5.25 file3.csv
6 12/22/2010       5.19 file3.csv
7 12/23/2010       5.25 file4.csv
8 12/22/2010       5.19 file4.csv
date file1.csv file2.csv file3.csv file4.csv
1 12/22/2010      5.19      5.19      5.19      5.19
2 12/23/2010      5.25      5.25      5.25      5.25

        
On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com> wrote:

  
    
#
You can simplify this a little with plyr:

fileNames <- list.files(pattern = "file.*.csv")
names(fileNames) <- fileNames

input <- ldply(fileNames, read.table, header = TRUE, as.is = TRUE)

Hadley
#
On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com> wrote:
In the development version of zoo you can do all this in basically one
read.zoo command producing the required zoo series:

# chron's default date format is the same as in the output*.csv files
library(chron)

# pull in development version of read.zoo
library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=813&root=zoo")

# this does it
z <- read.zoo(Sys.glob("output*.csv"), header = TRUE, FUN = as.chron)

as.data.frame(z) or data.frame(Time = time(z), coredata(z)) can be
used to convert z to a data frame with times as row names or a data
frame with times in column respectively (although you may wish to just
leave it as a zoo object so you can take advantage of zoo's other
facilities too).
2 days later
#
Many ways of doing this and you have to think about efficiency and 
logisitcs of different approaches.

If the data is not large, you can read all n files into a list and then 
combine. If data is very large, you may wish to read one file at a time, 
combining and then deleting it before reading the next file. You can use 
cbind() to combine if all the Date columns are the same, otherwise 
merge() is useful.

The simple brute force approach would be:

  fns <- list.files(pattern="^output")
  do.call( "cbind", lapply(fns, read.csv, row.names=1) )


The slightly more optimized and flexible optiop but slightly less 
elegant could be something like this:

  fns <- list.files(pattern="^output")
  out <- read.csv(fns[1], row.names=NULL)

  for(fn in fns[-1]){
    tmp <- read.csv(fn, row.names=NULL)
    out <- merge(out, tmp, by=1, all=T)
    rm(tmp); gc()
  }

You have to see which option is best for your file sizes. Good luck.

Regards, Adai
On 23/12/2010 13:07, Amy Milano wrote: