parsing files for plot
Hi again,
Below are two versions, depending on whether you want to use scan or read.table,
## with scan
library(reshape)
listOfFiles <- list.files()
d <- llply(listOfFiles, scan)
names(d) <- basename(listOfFiles)
melt(d)
## with read.table
listOfFiles <- list.files()
names(listOfFiles) <- basename(listOfFiles)
library(plyr)
ldply(listOfFiles, read.table)
Note, I tested this code with the following files,
system("mkdir dummy")
setwd(paste(getwd(), "/dummy", sep=""))
files <- replicate(5, rnorm(sample(3:20, 1)), simplify=FALSE)
names <- paste("datafile", letters[1:5],".txt", sep="")
l_ply(seq_along(files), function(ii, ...) write.table(x=files[[ii]],
file=names[ii], ... ),
row.names = F, col.names = F)
HTH,
baptiste
On 30 January 2010 14:23, Maxim <deeepersound at googlemail.com> wrote:
Hi,
my data is really not spectacular, each of the 6 files (later several
hundred) contains correlation coefficients in plain text format like:
0.923960073
0.923960073
0.612571344
0.064183275
0.007733399
-0.315444372
-0.064591277
-0.268336142
...........
with between 1000-13000 rows.
Scanning from the directory works, as this script:
comb<-data.frame()
count<-0
files <- list.files()? # all files in the working directory
for(i in files) {
????????????????? count<-count+1
?????? tmp <- scan(i)
?????? assign(files[count], tmp)
?????? if (i ==1)
?????? comb<-data.frame(dats=c(tmp), index=c(rep(files[1], length(tmp))))
?????? else
?????? combadd<-data.frame(dats=c(tmp), index=c(rep(files[count],
length(tmp))))
?????? comb<-rbind(comb,combadd)
}
boxplot(dats ~ index, data = comb)
works just great. There is no additional files in the folder. But look, how
much code for such a simple task. I'd definitely prefer the plyr solution.
Maxim
2010/1/30 baptiste auguie <baptiste.auguie at googlemail.com>
Why don't you post an example of what your input files look like? (to the list, not just to me!) A reproducible example is always required if you want a good answer. Note that if you are scanning *all* files in the working directory, you may also be scanning the R file containing your instructions which won't have the correct format, obviously. Best, baptiste On 30 January 2010 13:52, Maxim <deeepersound at googlemail.com> wrote:
Hi,
thanks, that looks much more elegant than what I managed to accomplish
in
meantime:
count<-1
files <- list.files()? # all files in the working directory
for(i in files) {
?????? tmp <- scan(i)
?????? assign(files[count], tmp)
?????? if (i ==1)
?????? comb<-data.frame(dats=c(tmp), index=c(rep(files[1],
length(tmp))))
?????? else
?????? combadd<-data.frame(dats=c(tmp), index=c(rep(files[count],
length(tmp))))
?????? comb<-rbind(comb,combadd)
?????? count<-count+1
}
boxplot(dats ~ index, data = comb)
This code works, unfortunately the plots get plotted in a different
order
than expected (appears to be more or less random to me). Why is this?
Concerning your code: I get an error like:
Read 2652 items
Read 3310 items
Read 1096 items
Read 2177 items
Read 11387 items
Read 12503 items
Error in list_to_dataframe(res, attr(.data, "split_labels")) :
? Results are not equal lengths
hmmh?
Maxim
2010/1/30 baptiste auguie <baptiste.auguie at googlemail.com>
Hi, Hadley recently proposed a strategy using plyr for a very similar problem, listOfFiles <- list.files() names(listOfFiles) <- basename(listOfFiles) library(plyr) d <- ldply(listOfFiles, scan) Even if you don't want to use plyr, it's always better to group things in a list rather than clutter your workspace with lots of assign()ed variables. HTH, baptiste On 30 January 2010 13:19, Maxim <deeepersound at googlemail.com> wrote:
Hi,
I have many files containing one column of data. I like to use the
scan
function to parse the data. Next I like to bind to a large vector.
I try this like:
count<-1
files <- list.files() ?# all files in the working directory
for(i in files) {
? ? ? tmp <- scan(i)
? ? ? assign(files[count], tmp)
? ? ?count<-count+1
}
This part works!
Now I like to plot the data in a boxplot.
Usually I do this from individual vectors like:
comb <- data.frame(dat = c(vector1, vector2 ......), ind =
c(rep('vector1',
length(vector1)).......))
boxplot(dat ~ ind, data = comb)
But how do I do this i a loop?
I know the vector names (according to the filenames in the working
directory), but I do not how to access them in my R code after having
assigned the names.
I guess the "lapply" or "dply" from the plyr library can do this, but
I
seem
not to be able to do it.
Is there a way to do this?
gma
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.