I'm new to R and struggling to reproduce graphs I've made with gnuplot.
Example here:
http://www.slamb.org/tmp/one-active.png
I have three different data sets plotted on the same axis. (I also have
a number of samples for each X value which I displayed with quartiles
rather than plotting every point; that will likely be the subject of my
next question.)
My attempts to do this in R: I've put the data into a frame with (x, y,
dataset) columns; dataset is a categorical variable. I can get a coplot
which sort of shows the information:
attach(myframe)
coplot(y ~ x | dataset)
but not on the same axis with a legend. I'd like to start by getting
that in a scatterplot form:
# XXX: datasets hardcoded in here...
# is split() supposed to do something similar to this?
# or how do I get a list of datasets to feed into subset?
myframe_a <- subset(myframe, dataset=='a')
myframe_b <- subset(myframe, dataset=='b')
...
and then I can apparently plot one and add points from others to it:
# XXX: more hardcoding...
attach(myframe_a)
plot(x, y, col='red')
detach(myframe_a)
attach(myframe_b)
points(x, y, col='blue')
detach(myframe_b)
...
legend("topleft",
legend=c("a", "b", ...),
fill=c("red", "blue", ...))
but there are several things I don't like about this solution:
* there probably is an existing function which does this? I can't find it.
* obviously I don't want to duplicate code for each dataset. I'd rather
loop based on whatever datasets are in the frame, but I'm missing how to
do that here.
* points() appears to not alter xlim and ylim. Is there a convenient way
to autodetermine them based on all the points?
* I've hardcoded the colors. This is the sort of thing I'd rather leave
to an expert. (I.e. someone who has looked at colorblindness studies and
knows which colors are easiest to distinguish.)
Any ideas?
Cheers,
Scott
plot multiple data sets on same axis
5 messages · Scott Lamb, jim holtman
Here is one way of doing it by splitting the data and plotting each
set in a different color:
# generate some test data
mydf <- data.frame(x=runif(200), y=rnorm(200),
dataset=sample(LETTERS[1:4], 200, TRUE))
# setup the plot are for the maximum of the data
plot(0, xlim=range(mydf$x), ylim=range(mydf$y), type='n', ylab="Y", xlab="X")
# split by 'dataset' and then plot each series; sort by 'x' first
mydf[] <- mydf[order(mydf$x),]
split.df <- split(mydf, mydf$dataset)
# loop through each set and plot with a different color
for (i in seq_along(split.df)){
lines(split.df[[i]]$x, split.df[[i]]$y, col=i)
}
On Dec 30, 2007 5:11 PM, Scott Lamb <slamb at slamb.org> wrote:
I'm new to R and struggling to reproduce graphs I've made with gnuplot. Example here: http://www.slamb.org/tmp/one-active.png I have three different data sets plotted on the same axis. (I also have a number of samples for each X value which I displayed with quartiles rather than plotting every point; that will likely be the subject of my next question.) My attempts to do this in R: I've put the data into a frame with (x, y, dataset) columns; dataset is a categorical variable. I can get a coplot which sort of shows the information: attach(myframe) coplot(y ~ x | dataset) but not on the same axis with a legend. I'd like to start by getting that in a scatterplot form: # XXX: datasets hardcoded in here... # is split() supposed to do something similar to this? # or how do I get a list of datasets to feed into subset? myframe_a <- subset(myframe, dataset=='a') myframe_b <- subset(myframe, dataset=='b') ... and then I can apparently plot one and add points from others to it: # XXX: more hardcoding... attach(myframe_a) plot(x, y, col='red') detach(myframe_a) attach(myframe_b) points(x, y, col='blue') detach(myframe_b) ... legend("topleft", legend=c("a", "b", ...), fill=c("red", "blue", ...)) but there are several things I don't like about this solution: * there probably is an existing function which does this? I can't find it. * obviously I don't want to duplicate code for each dataset. I'd rather loop based on whatever datasets are in the frame, but I'm missing how to do that here. * points() appears to not alter xlim and ylim. Is there a convenient way to autodetermine them based on all the points? * I've hardcoded the colors. This is the sort of thing I'd rather leave to an expert. (I.e. someone who has looked at colorblindness studies and knows which colors are easiest to distinguish.) Any ideas? Cheers, Scott
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Thank you - that was exactly what I needed. I just discovered read.csv
understands URLs, so here it is with my actual data and formatting:
df <- read.csv("http://www.slamb.org/tmp/one-active.csv")
split.df <- split(df, df$method)
plot(0, xlim=range(df$inactive), ylim=range(df$elapsed), type="n",
ylab="time (?s)", xlab="inactive file descriptors", log="y",
main="1 active descriptor, 1 write", bty="l")
grid()
for (i in seq_along(split.df)) {
lines(split.df[[i]]$inactive, split.df[[i]]$elapsed, col=i)
}
legend("topleft", legend=labels(split.df),
fill=seq_along(split.df), bty="n")
Next question. Rather than plot each point, I'd like to use boxplots to
show the interquartile range for each x.
I've tried replacing the for loop body with this:
this_method <- split.df[[i]]
boxplot(elapsed~inactive, data=this_method,
add=TRUE, border=i, boxfill=i, outline=FALSE)
but it has two problems:
* it doesn't plot at the correct x values. It looks like I need to
supply a list of the x values as the "at" parameter, and I don't know
how to get unique values from this_method$inactive. (I tried the clunky
labels(split(this_method, this_method$inactive)), but it returns them as
strings.)
* it redraws the graph's frame, and it ignores bty="l" when doing so.
Not sure how to correct either.
Cheers,
Scott
Scott Lamb <http://www.slamb.org/>
Scott Lamb wrote:
I've tried replacing the for loop body with this:
this_method <- split.df[[i]]
boxplot(elapsed~inactive, data=this_method,
add=TRUE, border=i, boxfill=i, outline=FALSE)
but it has two problems:
* it doesn't plot at the correct x values. It looks like I need to
supply a list of the x values as the "at" parameter, and I don't know
how to get unique values from this_method$inactive. (I tried the clunky
labels(split(this_method, this_method$inactive)), but it returns them as
strings.)
Ahh. I missed the obvious answer - there's a function called unique. at=unique(this_method$inactive) works.
* it redraws the graph's frame, and it ignores bty="l" when doing so.
also the x axis tics and labels...they're totally unreadable now. Cheers, Scott
Scott Lamb <http://www.slamb.org/>
Scott Lamb wrote:
Scott Lamb wrote:
I've tried replacing the for loop body with this:
this_method <- split.df[[i]]
boxplot(elapsed~inactive, data=this_method,
add=TRUE, border=i, boxfill=i, outline=FALSE)
but it has two problems:
* it doesn't plot at the correct x values. It looks like I need to
supply a list of the x values as the "at" parameter, and I don't know
how to get unique values from this_method$inactive. (I tried the clunky
labels(split(this_method, this_method$inactive)), but it returns them as
strings.)
Ahh. I missed the obvious answer - there's a function called unique. at=unique(this_method$inactive) works.
* it redraws the graph's frame, and it ignores bty="l" when doing so.
also the x axis tics and labels...they're totally unreadable now.
Ahh. There is an "axis=FALSE" parameter to bxp, which boxplot passes
along. Thanks again, and sorry for all the list noise.
For the record, here's exactly what I did to duplicate the original graph:
df <- read.csv("http://www.slamb.org/tmp/one-active.csv")
png(filename="one-active.png", width=800, height=600)
split.df <- split(df, df$method)
plot(0, xlim=c(0, max(df$inactive)), ylim=range(df$elapsed),
ylab="time (?s)", xlab="inactive file descriptors", log="y",
main="1 active descriptor, 1 write", bty="n", type="n")
grid()
for (i in seq_along(split.df)) {
this_method <- split.df[[i]]
unique_inactive <- unique(this_method$inactive)
boxplot(elapsed~inactive, data=this_method,
at=unique_inactive, axes=FALSE,
add=TRUE, border=i, boxfill=i, outline=FALSE, bty="l",
whisklty="solid", staplelty="blank", medlty="blank",
boxwex=max(unique_inactive)/length(unique_inactive)/2)
}
legend("topleft", legend=labels(split.df),
fill=seq_along(split.df), bty="n")
Cheers,
Scott
Scott Lamb <http://www.slamb.org/>