Is there a way in ggplot to make a histogram with the left-hand y-axis label as frequency, and a right-hand y-axis label as percentage? Thanks! Pete
ggplot2 / histogram / y-axis
4 messages · Pete Kazmier, Hadley Wickham
On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
Is there a way in ggplot to make a histogram with the left-hand y-axis label as frequency, and a right-hand y-axis label as percentage?
Not currently. I did a quick exploration to see if it was feasible to
draw another axis on with grid, but it doesn't look like it's
possible:
p <- qplot(rating, data=movies, geom="histogram")
# Map aesthetics to data
data <- p$layers[[1]]$make_aesthetics(p)
# Calculate statistic "by hand" (we'll need this to get the scales right)
binned <- StatBin$calculate(data=data, p$scales)
n <- nrow(movies)
# Manually recreate the y scale
sp <- scale_y_continuous()
sp$train(binned$count)
# rescale the labels
labels <- formatC(sp$breaks() / n, digits=2)
# Have to do without labels because of bug in grid
print(p, pretty=FALSE)
downViewport("panel_1_1")
grid.draw(ggaxis(sp$breaks(), as.list(labels), "right", sp$frange()))
# Why don't labels line up? - I'm not sure
# How could you make space for the extra axis? - Not sure either
# How would this worked for a facetted graphic - not well
Also how were you expecting the axes/gridlines to line up? Would both
axes be labelled "nicely" (with whole numbers) and the secondary axis
wouldn't have gridlines; or would the second axis match the lines of
the primary, even though the number wouldn't be so attractive?
Hadley
"hadley wickham" <h.wickham at gmail.com> writes:
On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
Is there a way in ggplot to make a histogram with the left-hand y-axis label as frequency, and a right-hand y-axis label as percentage?
Not currently. I did a quick exploration to see if it was feasible to draw another axis on with grid, but it doesn't look like it's possible:
Thank you for trying.
Also how were you expecting the axes/gridlines to line up? Would both axes be labelled "nicely" (with whole numbers) and the secondary axis wouldn't have gridlines; or would the second axis match the lines of the primary, even though the number wouldn't be so attractive?
I hadn't thought that far ahead. Depending on the audience, I render histograms differently, and was curious if I could just put both on a single graph. However, you bring up some interesting questions in terms of the presentation. On another note, and feel free to defer me to the documentation which I'm still in the process of reading, but will I be able to take advantage of some of Tufte's recommendations in terms of the typical histogram and/or scatterplots (pp126-134 in Visual Display of Quantitative Information)? For example, with histograms, he would eliminates the use of coordinate lines in favor of using a white grid to improve the data/ink ratio. Likewise in scatterplots, he uses range-frames and dot-dash-plots. Will I be able to use ggplot for these types of enhancements? Thanks, Pete
On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
"hadley wickham" <h.wickham at gmail.com> writes:
On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
Is there a way in ggplot to make a histogram with the left-hand y-axis label as frequency, and a right-hand y-axis label as percentage?
Not currently. I did a quick exploration to see if it was feasible to draw another axis on with grid, but it doesn't look like it's possible:
Thank you for trying.
Also how were you expecting the axes/gridlines to line up? Would both axes be labelled "nicely" (with whole numbers) and the secondary axis wouldn't have gridlines; or would the second axis match the lines of the primary, even though the number wouldn't be so attractive?
I hadn't thought that far ahead. Depending on the audience, I render histograms differently, and was curious if I could just put both on a single graph. However, you bring up some interesting questions in terms of the presentation. On another note, and feel free to defer me to the documentation which I'm still in the process of reading, but will I be able to take advantage of some of Tufte's recommendations in terms of the typical histogram and/or scatterplots (pp126-134 in Visual Display of Quantitative Information)? For example, with histograms, he would eliminates the use of coordinate lines in favor of using a white grid to improve the data/ink ratio. Likewise in scatterplots, he uses range-frames and dot-dash-plots. Will I be able to use ggplot for these types of enhancements?
I am familiar with Tufte's suggestions, and while they do increase the
data-ink ratio, I'm not confident they actually make the plot any
better perceptually. Displaying grid lines on _top_ of data seems
like a bad idea, and throwing away the plot frame is a bad idea
because you loose important visual reference points. Range frames
also fail to scale to facetted plots.
If you're not already familiar with them, I strongly recommend the
following two papers which tacke similar ideas to Tufte but in a
rigourous scientific framework:
@article{cleveland:1987,
Author = {Cleveland, William and McGill, Robert},
Journal = {Journal of the Royal Statistical Society. Series A (General)},
Number = {3},
Pages = {192-229},
Title = {Graphical Perception: The Visual Decoding of Quantitative
Information on Graphical Displays of Data},
Volume = {150},
Year = {1987}}
@article{cleveland:1993a,
Author = {Cleveland, William},
Journal = {Journal of Computational and Graphical Statistics},
Pages = {323-364},
Title = {A model for studying display methods of statistical graphics},
Volume = {2},
Year = {1993}}
Hadley