parsing files for plot

Sat, Jan 30, 2010 5:43 AM

Hi again,

Below are two versions, depending on whether you want to use scan or read.table,

## with scan
library(reshape)
listOfFiles <- list.files()
d <- llply(listOfFiles, scan)
names(d) <- basename(listOfFiles)

melt(d)

## with read.table

listOfFiles <- list.files()
names(listOfFiles) <- basename(listOfFiles)

library(plyr)
ldply(listOfFiles, read.table)


Note, I tested this code with the following files,

system("mkdir dummy")
setwd(paste(getwd(), "/dummy", sep=""))

files <- replicate(5, rnorm(sample(3:20, 1)), simplify=FALSE)
names <- paste("datafile", letters[1:5],".txt",  sep="")

l_ply(seq_along(files), function(ii, ...) write.table(x=files[[ii]],
file=names[ii], ... ),
      row.names = F, col.names = F)

HTH,

baptiste

On 30 January 2010 14:23, Maxim <deeepersound at googlemail.com> wrote:

Hi,

my data is really not spectacular, each of the 6 files (later several
hundred) contains correlation coefficients in plain text format like:

0.923960073
0.923960073
0.612571344
0.064183275
0.007733399
-0.315444372
-0.064591277
-0.268336142
...........

with between 1000-13000 rows.

Scanning from the directory works, as this script:

comb<-data.frame()
count<-0
files <- list.files()? # all files in the working directory
for(i in files) {
????????????????? count<-count+1

?????? tmp <- scan(i)
?????? assign(files[count], tmp)

?????? if (i ==1)
?????? comb<-data.frame(dats=c(tmp), index=c(rep(files[1], length(tmp))))
?????? else
?????? combadd<-data.frame(dats=c(tmp), index=c(rep(files[count],
length(tmp))))
?????? comb<-rbind(comb,combadd)

}
boxplot(dats ~ index, data = comb)


works just great. There is no additional files in the folder. But look, how
much code for such a simple task. I'd definitely prefer the plyr solution.

Maxim


2010/1/30 baptiste auguie <baptiste.auguie at googlemail.com>

Why don't you post an example of what your input files look like? (to
the list, not just to me!) A reproducible example is always required
if you want a good answer.

Note that if you are scanning *all* files in the working directory,
you may also be scanning the R file containing your instructions which
won't have the correct format, obviously.

Best,

baptiste

On 30 January 2010 13:52, Maxim <deeepersound at googlemail.com> wrote:

Hi,

thanks, that looks much more elegant than what I managed to accomplish
in
meantime:

count<-1
files <- list.files()? # all files in the working directory
for(i in files) {

?????? tmp <- scan(i)
?????? assign(files[count], tmp)

?????? if (i ==1)
?????? comb<-data.frame(dats=c(tmp), index=c(rep(files[1],
length(tmp))))
?????? else
?????? combadd<-data.frame(dats=c(tmp), index=c(rep(files[count],
length(tmp))))
?????? comb<-rbind(comb,combadd)

?????? count<-count+1
}
boxplot(dats ~ index, data = comb)


This code works, unfortunately the plots get plotted in a different
order
than expected (appears to be more or less random to me). Why is this?


Concerning your code: I get an error like:

Read 2652 items
Read 3310 items
Read 1096 items
Read 2177 items
Read 11387 items
Read 12503 items
Error in list_to_dataframe(res, attr(.data, "split_labels")) :
? Results are not equal lengths

hmmh?

Maxim


2010/1/30 baptiste auguie <baptiste.auguie at googlemail.com>

Hi,

Hadley recently proposed a strategy using plyr for a very similar
problem,

listOfFiles <- list.files()
names(listOfFiles) <- basename(listOfFiles)

library(plyr)
d <- ldply(listOfFiles, scan)

Even if you don't want to use plyr, it's always better to group things
in a list rather than clutter your workspace with lots of assign()ed
variables.

HTH,

baptiste


On 30 January 2010 13:19, Maxim <deeepersound at googlemail.com> wrote:

Hi,

I have many files containing one column of data. I like to use the
scan
function to parse the data. Next I like to bind to a large vector.
I try this like:

count<-1
files <- list.files() ?# all files in the working directory
for(i in files) {

? ? ? tmp <- scan(i)
? ? ? assign(files[count], tmp)
? ? ?count<-count+1
}

This part works!

Now I like to plot the data in a boxplot.

Usually I do this from individual vectors like:

comb <- data.frame(dat = c(vector1, vector2 ......), ind =
c(rep('vector1',
length(vector1)).......))
boxplot(dat ~ ind, data = comb)

But how do I do this i a loop?

I know the vector names (according to the filenames in the working
directory), but I do not how to access them in my R code after having
assigned the names.

I guess the "lapply" or "dply" from the plyr library can do this, but
I
seem
not to be able to do it.

Is there a way to do this?

gma

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

parsing files for plot

Thread (3 messages)