An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101223/76603832/attachment.pl>
Writing a single output file
5 messages · Amy Milano, jim holtman, Hadley Wickham +2 more
This should get you close:
# get file names
setwd('/temp')
fileNames <- list.files(pattern = "file.*.csv")
fileNames
[1] "file1.csv" "file2.csv" "file3.csv" "file4.csv"
input <- do.call(rbind, lapply(fileNames, function(.name){
+ .data <- read.table(.name, header = TRUE, as.is = TRUE) + # add file name to the data + .data$file <- .name + .data + }))
input
date yield_rate file 1 12/23/2010 5.25 file1.csv 2 12/22/2010 5.19 file1.csv 3 12/23/2010 5.25 file2.csv 4 12/22/2010 5.19 file2.csv 5 12/23/2010 5.25 file3.csv 6 12/22/2010 5.19 file3.csv 7 12/23/2010 5.25 file4.csv 8 12/22/2010 5.19 file4.csv
require(reshape) in.melt <- melt(input, measure = 'yield_rate') cast(in.melt, date ~ file)
date file1.csv file2.csv file3.csv file4.csv 1 12/22/2010 5.19 5.19 5.19 5.19 2 12/23/2010 5.25 5.25 5.25 5.25
On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com> wrote:
Dear R helpers!
Let me first wish all of you "Merry Christmas and Very Happy New year 2011"
"Christmas day is a day of Joy and Charity,
May God make you rich in both" - Phillips Brooks
## ----------------------------------------------------------------------------------------------------------------------------
I have a process which generates number of outputs. The R code for the same is as given below.
for(i in 1:n)
{
write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names = FALSE)
}
Depending on value of 'n', I get different output files.
Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv'
output1.csv
date?????????????? yield_rate
12/23/2010??????? 5.25
12/22/2010??????? 5.19
.................................
.................................
output2.csv
date?????????????? yield_rate
12/23/2010??????? 4.16
12/22/2010??????? 4.59
.................................
.................................
output3.csv
date?????????????? yield_rate
12/23/2010??????? 6.15
12/22/2010??????? 6.41
.................................
.................................
Thus all the output files have same column names viz. Date and yield_rate. Also, I do need these files individually too.
My further requirement is to have a single dataframe as given below.
Date???????????? yield_rate1?????????????? yield_rate2??????????????? yield_rate3
12/23/2010?????? 5.25????????????????????????? 4.16????????????????????????? 6.15
12/22/2010?????? 5.19 ? ? ? ? ? ? ? ? ? ? ? ?? 4.59 ? ? ? ? ? ? ? ? ? ? ? ?? 6.41
...............................................................................................
...............................................................................................
where yield_rate1 = output1$yield_rate and so on.
One way is to simply create a dataframe as
df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 =? read.csv('output1.csv')$yield_rate,?? yield_rate2 = read.csv('output2.csv')$yield_rate,
yield_rate3 = read.csv('output3.csv')$yield_rate)
However, the problem arises when I am not aware how many output files are there as n can be 5 or even 100.
So is it possible to write some loop or some function which will enable me to read 'n' files individually and then keeping "Date" common, only pickup the yield_curve data from each output file.
Thanking in advance for any guidance.
Regards
Amy
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
input <- do.call(rbind, lapply(fileNames, function(.name){
+ ? ? .data <- read.table(.name, header = TRUE, as.is = TRUE) + ? ? # add file name to the data + ? ? .data$file <- .name + ? ? .data + }))
You can simplify this a little with plyr: fileNames <- list.files(pattern = "file.*.csv") names(fileNames) <- fileNames input <- ldply(fileNames, read.table, header = TRUE, as.is = TRUE) Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com> wrote:
Dear R helpers!
Let me first wish all of you "Merry Christmas and Very Happy New year 2011"
"Christmas day is a day of Joy and Charity,
May God make you rich in both" - Phillips Brooks
## ----------------------------------------------------------------------------------------------------------------------------
I have a process which generates number of outputs. The R code for the same is as given below.
for(i in 1:n)
{
write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names = FALSE)
}
Depending on value of 'n', I get different output files.
Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv'
output1.csv
date?????????????? yield_rate
12/23/2010??????? 5.25
12/22/2010??????? 5.19
.................................
.................................
output2.csv
date?????????????? yield_rate
12/23/2010??????? 4.16
12/22/2010??????? 4.59
.................................
.................................
output3.csv
date?????????????? yield_rate
12/23/2010??????? 6.15
12/22/2010??????? 6.41
In the development version of zoo you can do all this in basically one
read.zoo command producing the required zoo series:
# chron's default date format is the same as in the output*.csv files
library(chron)
# pull in development version of read.zoo
library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=813&root=zoo")
# this does it
z <- read.zoo(Sys.glob("output*.csv"), header = TRUE, FUN = as.chron)
as.data.frame(z) or data.frame(Time = time(z), coredata(z)) can be
used to convert z to a data frame with times as row names or a data
frame with times in column respectively (although you may wish to just
leave it as a zoo object so you can take advantage of zoo's other
facilities too).
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
2 days later
Many ways of doing this and you have to think about efficiency and
logisitcs of different approaches.
If the data is not large, you can read all n files into a list and then
combine. If data is very large, you may wish to read one file at a time,
combining and then deleting it before reading the next file. You can use
cbind() to combine if all the Date columns are the same, otherwise
merge() is useful.
The simple brute force approach would be:
fns <- list.files(pattern="^output")
do.call( "cbind", lapply(fns, read.csv, row.names=1) )
The slightly more optimized and flexible optiop but slightly less
elegant could be something like this:
fns <- list.files(pattern="^output")
out <- read.csv(fns[1], row.names=NULL)
for(fn in fns[-1]){
tmp <- read.csv(fn, row.names=NULL)
out <- merge(out, tmp, by=1, all=T)
rm(tmp); gc()
}
You have to see which option is best for your file sizes. Good luck.
Regards, Adai
On 23/12/2010 13:07, Amy Milano wrote:
Dear R helpers!
Let me first wish all of you "Merry Christmas and Very Happy New year 2011"
"Christmas day is a day of Joy and Charity,
May God make you rich in both" - Phillips Brooks
## ----------------------------------------------------------------------------------------------------------------------------
I have a process which generates number of outputs. The R code for the same is as given below.
for(i in 1:n)
{
write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names = FALSE)
}
Depending on value of 'n', I get different output files.
Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv'
output1.csv
date yield_rate
12/23/2010 5.25
12/22/2010 5.19
.................................
.................................
output2.csv
date yield_rate
12/23/2010 4.16
12/22/2010 4.59
.................................
.................................
output3.csv
date yield_rate
12/23/2010 6.15
12/22/2010 6.41
.................................
.................................
Thus all the output files have same column names viz. Date and yield_rate. Also, I do need these files individually too.
My further requirement is to have a single dataframe as given below.
Date yield_rate1 yield_rate2 yield_rate3
12/23/2010 5.25 4.16 6.15
12/22/2010 5.19 4.59 6.41
...............................................................................................
...............................................................................................
where yield_rate1 = output1$yield_rate and so on.
One way is to simply create a dataframe as
df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 = read.csv('output1.csv')$yield_rate, yield_rate2 = read.csv('output2.csv')$yield_rate,
yield_rate3 = read.csv('output3.csv')$yield_rate)
However, the problem arises when I am not aware how many output files are there as n can be 5 or even 100.
So is it possible to write some loop or some function which will enable me to read 'n' files individually and then keeping "Date" common, only pickup the yield_curve data from each output file.
Thanking in advance for any guidance.
Regards
Amy
[[alternative HTML version deleted]]