An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120423/c9999fff/attachment.pl>
How to insert filename as column in a file
7 messages · jim holtman, MacQueen, Don, Shivam +1 more
This might do it for you:
for (i in fileNames){
input <- read.table(i, .....)
# you might want to use regular expressions to extract just the date.
input$fileName <- i
write.table(i, ....)
}
On Mon, Apr 23, 2012 at 12:29 PM, Shivam <shivamsingh at gmail.com> wrote:
Hi, I am relatively new to R. Have scourged the help files and the www but havent been able to get a solution. I have around 250 csv files, one file for each date. They have columns of all types, numeric, string etc. The name of each file is the date in the form of 'yyyymmdd'. There is no column within the file which helps me identify the date on which the file was generated, only the filename has that info. I am selecting some data (using read.csv.sql) from each file and creating a dataset for each day. Ultimately I will combine all the datasets. I can accomplish the select and combine part, but after combining I wont have a record as to the date corresponding to the data. Hence I want to insert the filename as a column in the respective file to help me in identifying to what date each data row belongs to. Sorry for the long mail, but wanted to make myself clear. Any help would be greatly appreciated. Thanks in advance, Shivam ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
This little example might help.
foo <- data.frame(a=1:10, b=letters[1:0]) foo
a b 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 a 7 7 a 8 8 a 9 9 a 10 10 a
foo$date <- '20120423' foo
a b date 1 1 a 20120423 2 2 a 20120423 3 3 a 20120423 4 4 a 20120423 5 5 a 20120423 6 6 a 20120423 7 7 a 20120423 8 8 a 20120423 9 9 a 20120423 10 10 a 20120423 In other words, immediately after reading the data into a data frame, add a date column as in the example. You'll have to extract the date from the filename, of course. -Don
Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/23/12 9:29 AM, "Shivam" <shivamsingh at gmail.com> wrote: >Hi, > >I am relatively new to R. Have scourged the help files and the www but >havent been able to get a solution. > >I have around 250 csv files, one file for each date. They have columns of >all types, numeric, string etc. The name of each file is the date in the >form of 'yyyymmdd'. There is no column within the file which helps me >identify the date on which the file was generated, only the filename has >that info. > >I am selecting some data (using read.csv.sql) from each file and creating >a >dataset for each day. Ultimately I will combine all the datasets. I can >accomplish the select and combine part, but after combining I wont have a >record as to the date corresponding to the data. > >Hence I want to insert the filename as a column in the respective file to >help me in identifying to what date each data row belongs to. > >Sorry for the long mail, but wanted to make myself clear. Any help would >be >greatly appreciated. > >Thanks in advance, >Shivam > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120424/1ec61892/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120424/a3a00bf3/attachment.pl>
Programatically dealing with large numbers of separately-named objects leads to syntactically complicated code that is hard to read and maintain.
Load the data frames into a list so you can access them by numeric or named index, and then getting at the loaded data will be much easier.
fnames = list.files(path = getwd())
# preallocating the list for efficiency (execution speed)
dtalist <- vector( "list", length(fnames) )
for (i in seq_len(length(fnames))){
dtalist[[i]] <- read.csv.sql(fnames[i], sql = "select * from file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol ="\n"))
dtalist[[i]]$date <- substr(fnames[i],1,8))
}
names(dtalist) <- fnames
# now you can optionally refer to dtalist$file20120424.csv or dtalist[["file20120424"]] if you wish.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Shivam <shivamsingh at gmail.com> wrote:
Reposting in hope of a reply. On Tue, Apr 24, 2012 at 1:12 AM, Shivam <shivamsingh at gmail.com> wrote:
Thanks for the quick response. It works for an individual dataframe,
but I
have many dataframes. This is the code so far
fnames = list.files(path = getwd())
for (i in 1:length(fnames)){
assign(paste("file",i,sep=""),read.csv.sql(fnames[i], sql = "select *
from
file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol =
"\n"))
} This generates dataframes named as as file1,file2,...,file250. Is
there a
way to do something like below within the same loop?
file1$date = substr(fnames[1],1,8))
file2$date = substr(fnames[2],1,8))
.
.
file250$date = substr(fnames[250],1,8))
assign(paste("file",i,sep="")$date doesnt work.
Any help?
On Tue, Apr 24, 2012 at 12:01 AM, MacQueen, Don
<macqueen1 at llnl.gov>wrote:
This little example might help.
foo <- data.frame(a=1:10, b=letters[1:0]) foo
a b 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 a 7 7 a 8 8 a 9 9 a 10 10 a
foo$date <- '20120423' foo
a b date 1 1 a 20120423 2 2 a 20120423 3 3 a 20120423 4 4 a 20120423 5 5 a 20120423 6 6 a 20120423 7 7 a 20120423 8 8 a 20120423 9 9 a 20120423 10 10 a 20120423 In other words, immediately after reading the data into a data
frame, add
a date column as in the example. You'll have to extract the date
from the
filename, of course. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/23/12 9:29 AM, "Shivam" <shivamsingh at gmail.com> wrote:
Hi, I am relatively new to R. Have scourged the help files and the www
but
havent been able to get a solution. I have around 250 csv files, one file for each date. They have
columns of
all types, numeric, string etc. The name of each file is the date
in the
form of 'yyyymmdd'. There is no column within the file which helps
me
identify the date on which the file was generated, only the
filename has
that info. I am selecting some data (using read.csv.sql) from each file and
creating
a dataset for each day. Ultimately I will combine all the datasets. I
can
accomplish the select and combine part, but after combining I wont
have a
record as to the date corresponding to the data. Hence I want to insert the filename as a column in the respective
file to
help me in identifying to what date each data row belongs to. Sorry for the long mail, but wanted to make myself clear. Any help
would
be
greatly appreciated.
Thanks in advance,
Shivam
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- *Victoria Concordia Crescit*
-- *Victoria Concordia Crescit* [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120425/39795845/attachment.pl>