An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100611/63fd8364/attachment.pl>
Transforming simulation data which is spread across many files into a barplot
7 messages · Ian Bentley, Hadley Wickham, Bert Gunter +1 more
On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:
I'm an R newbie, and I'm just trying to use some of it's graphing
capabilities, but I'm a bit stuck - basically in massaging the already
available data into a format R likes.
I have a simulation environment which produces logs, which represent a
number of different things. ?I then run a python script on this data, and
putting it in a nicer format. ?Essentially, the python script reduces the
number of files by two orders of magnitude.
What I'm left with, is a number of files, which each have two columns of
data in them.
The files look something like this:
--1000.log--
Sent Received
405.0 3832.0
176.0 1742.0
176.0 1766.0
176.0 1240.0
356.0 3396.0
...
This file - called 1000.log - represents a data point at 1000. What I'd like
to do is to use a loop, to read in 50 or so of these files, and then produce
a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per file,
and two stacks per bar. ?The first stack would be the mean of the sent, and
the second would be the mean of the received.
I've used a loop to read files in R before, something like this ---
for (i in 1:50){
? ?tmpFile <- paste(base, i*100, ".log", sep="")
? ?tmp <- read.table(tmpFile)
}
# Load data library(plyr) paths <- dir(base, pattern = "\\.log", full = TRUE) names(paths) <- basename(paths) df <- ddply(paths, read.table) # Compute averages: avg <- ddply(df, ".id", summarise, sent = mean(sent), received = mean(received) You can read more about plyr at http://had.co.nz/plyr. Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Ouch! Lousy plot. Instead, plot the 50 (mean sent, mean received)pairs as a y vs x scatterplot to see the relationship. Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Hadley Wickham Sent: Friday, June 11, 2010 11:53 AM To: Ian Bentley Cc: r-help at r-project.org Subject: Re: [R] Transforming simulation data which is spread across manyfiles into a barplot
On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:
I'm an R newbie, and I'm just trying to use some of it's graphing capabilities, but I'm a bit stuck - basically in massaging the already available data into a format R likes. I have a simulation environment which produces logs, which represent a number of different things. ?I then run a python script on this data, and putting it in a nicer format. ?Essentially, the python script reduces the number of files by two orders of magnitude. What I'm left with, is a number of files, which each have two columns of data in them. The files look something like this: --1000.log-- Sent Received 405.0 3832.0 176.0 1742.0 176.0 1766.0 176.0 1240.0 356.0 3396.0 ... This file - called 1000.log - represents a data point at 1000. What I'd
like
to do is to use a loop, to read in 50 or so of these files, and then
produce
a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per
file,
and two stacks per bar. ?The first stack would be the mean of the sent,
and
the second would be the mean of the received.
I've used a loop to read files in R before, something like this ---
for (i in 1:50){
? ?tmpFile <- paste(base, i*100, ".log", sep="")
? ?tmp <- read.table(tmpFile)
}
# Load data library(plyr) paths <- dir(base, pattern = "\\.log", full = TRUE) names(paths) <- basename(paths) df <- ddply(paths, read.table) # Compute averages: avg <- ddply(df, ".id", summarise, sent = mean(sent), received = mean(received) You can read more about plyr at http://had.co.nz/plyr. Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100611/61cfffe3/attachment.pl>
Try this:
base <- "file" # replace as appropriate
N <- 50
filenames <- paste(base, seq_len(N)*100, ".log", sep = "")
mat <- sapply(filenames, function(fn)
colMeans(read.table(fn, col.names = c("Sent", "Received")))
)
barplot(mat)
On Fri, Jun 11, 2010 at 2:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:
I'm an R newbie, and I'm just trying to use some of it's graphing
capabilities, but I'm a bit stuck - basically in massaging the already
available data into a format R likes.
I have a simulation environment which produces logs, which represent a
number of different things. ?I then run a python script on this data, and
putting it in a nicer format. ?Essentially, the python script reduces the
number of files by two orders of magnitude.
What I'm left with, is a number of files, which each have two columns of
data in them.
The files look something like this:
--1000.log--
Sent Received
405.0 3832.0
176.0 1742.0
176.0 1766.0
176.0 1240.0
356.0 3396.0
...
This file - called 1000.log - represents a data point at 1000. What I'd like
to do is to use a loop, to read in 50 or so of these files, and then produce
a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per file,
and two stacks per bar. ?The first stack would be the mean of the sent, and
the second would be the mean of the received.
I've used a loop to read files in R before, something like this ---
for (i in 1:50){
? ?tmpFile <- paste(base, i*100, ".log", sep="")
? ?tmp <- read.table(tmpFile)
}
--- But I really don't know how to handle massaging this data into the
matrix I need.
I hope this makes sense, I find it a little hard to describe.
Can anyone give me some help jumping into this one?
Thanks
--
Ian Bentley
M.Sc. Candidate
Queen's University
Kingston, Ontario
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
So two time series? Fair enough. But less is more. Plot them as separates series of points connected by lines, different colors for the two different series. Or as two trellises plots. You may also wish to overlay a smooth to help the reader see the "trend"(e.g via a loess or other nonparametric smooth, or perhaps just a fitted line). The only part of a bar that conveys information is the top. The rest of the fill is "chartjunk" (Tufte's term) and distracts. Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Ian Bentley Sent: Friday, June 11, 2010 12:15 PM To: Bert Gunter Cc: r-help at r-project.org; Hadley Wickham Subject: Re: [R] Transforming simulation data which is spread acrossmanyfiles into a barplot I'm not trying to see the relation between sent and received, but rather to show how these grow across the increasing complexity of the 50 data points.
On 11 June 2010 15:02, Bert Gunter <gunter.berton at gene.com> wrote:
Ouch! Lousy plot. Instead, plot the 50 (mean sent, mean received)pairs as a y vs x scatterplot to see the relationship. Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Hadley Wickham Sent: Friday, June 11, 2010 11:53 AM To: Ian Bentley Cc: r-help at r-project.org Subject: Re: [R] Transforming simulation data which is spread across manyfiles into a barplot On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:
I'm an R newbie, and I'm just trying to use some of it's graphing capabilities, but I'm a bit stuck - basically in massaging the already available data into a format R likes. I have a simulation environment which produces logs, which represent a number of different things. I then run a python script on this data,
and
putting it in a nicer format. Essentially, the python script reduces
the
number of files by two orders of magnitude. What I'm left with, is a number of files, which each have two columns of data in them. The files look something like this: --1000.log-- Sent Received 405.0 3832.0 176.0 1742.0 176.0 1766.0 176.0 1240.0 356.0 3396.0 ... This file - called 1000.log - represents a data point at 1000. What I'd
like
to do is to use a loop, to read in 50 or so of these files, and then
produce
a stacked barplot. Ideally, the stacked barplot would have 1 bar per
file,
and two stacks per bar. The first stack would be the mean of the sent,
and
the second would be the mean of the received.
I've used a loop to read files in R before, something like this ---
for (i in 1:50){
tmpFile <- paste(base, i*100, ".log", sep="")
tmp <- read.table(tmpFile)
}
# Load data library(plyr) paths <- dir(base, pattern = "\\.log", full = TRUE) names(paths) <- basename(paths) df <- ddply(paths, read.table) # Compute averages: avg <- ddply(df, ".id", summarise, sent = mean(sent), received = mean(received) You can read more about plyr at http://had.co.nz/plyr. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ian Bentley M.Sc. Candidate Queen's University Kingston, Ontario [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100612/f355882c/attachment.pl>