Skip to content

Thinking about using two y-scales on your plot?

9 messages · Hadley Wickham, Johannes Hüsing, Jim Lemon +4 more

#
Please read this first:
http://www.perceptualedge.com/articles/visual_business_intelligence/dual-scaled_axes.pdf

It's a reasoned discussion of why it's a bad idea and proposes some
alternative methods.

Another good article is:
K. W. Haemer. Double scales are dangerous. The American Statistician,
2(3):24?24, 1948.

People have been advising dual-axis plots for (at least) 60 years!

Hadley
#
hadley wickham wrote:
As I am an obvious offender in the dual-ordinate plot field (I actually 
used one once about 25 years ago), I suppose I should at least 
contribute to the debate. Few's paper makes some very good points in my 
opinion. The dual ordinate barplot is too often misinterpreted for 
exactly the reason Few states. Bars starting from zero are just too easy 
to interpret as relative magnitudes. The inquiring reader will find that 
twoord.plot doesn't have a barplot option (although the enterprising 
user can easily hack barplot).

As the paper goes on, Few relies more on assertions than demonstrations. 
Consider the last injunction:

It is inappropriate to use more than one quantitative scale on a single 
axis, because, to some degree, this encourages people to compare 
magnitudes of values between then, but this is meaningless.

The crucial phrase, buried in the middle of this, is "to some degree". 
If the degree to which the viewer realizes that it is meaningless is 
greater than the degree to which that viewer is encouraged to compare 
magnitudes, there does not seem to be much of a problem. No evidence to 
support Few's implied outcome is adduced.

My own use of a dual-ordinate plot arose from a circumstance much like 
the final illustration in the paper. I wanted to show that the 
performance of rats on one aspect of a task was near perfect, while 
performance on another aspect was at chance level. However, instead of 
trying to convert the units into probabilities, I simply used the raw 
units scaled to equate the probabilities and added a horizontal line at 
the level of chance performance. No one complained. Did I successfully 
illustrate the dissociation of performance or merely get away with it? 
Unfortunately, I cannot answer that question, but I would love to have 
someone do a good study to either cheer me or knock me on the head. 
That's the way we improve our illustrative techniques.

Jim
#
On Tue, Mar 25, 2008 at 05:29:38PM -0500, hadley wickham wrote:
Thanks for this pointer, interesting read. As an additional alternative
to dual scales, le'ts not forget about scatterplots, which I didn't
see mentioned in that paper -- frequently, when you're stuck because
you can't dispense with either y-axis, it's easy to forget the option
to do without your current x-axis... ;-)

Best regards, Jan

  
    
#
Hi,

   Can someone tell me how to control sample size (n) in bootstrap function
   boot  in  R.  Can  we  give some option like we give for # of repeated
   samples(R=say 100).

   Will appreciate any help.

   thanks
#
I don't believe so.  Isn't one of the differences between the bootstrap and other kinds of resampling that the bootstrap samples with replacement a sample of the same size as the original data?  You could use the function sample() to select your subsets and compute your statistics of interest.

Hope this is helpful,

Dan  

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA  98504-5204
#
Hi Dan,

   Thanks  for response yes i do know that bootstrap samples generated by
   function boot are of the same size as original dataset but somewhere in the
   R-help threads i saw a suggestion that one can control sample size (n) by
   using the following command(plz see below) but my problem is it doesnt work
   it gives error ( error in : n * nboot : non-numeric argument to binary
   operator)

   bootstrap(data,statistic,sampler=samp.bootstrap(size=20))

    this is what somebody on R help suggested... can we fix that error somehow
   ?
On Wed, 26 Mar 2008 08:26:22 -0700 "Nordlund, Dan (DSHS/RDA)" wrote:
> > -----Original Message-----
   > > From: r-help-bounces at r-project.org
   > > [mailto:r-help-bounces at r-project.org] On Behalf Of Zaihra T
   > > Sent: Wednesday, March 26, 2008 7:57 AM
   > > To: Jan T. Kim; R-help at r-project.org
   > > Subject: ! [R] sample size in bootstrap(boot)
   > >
   > >
   > > Hi,
   > >
   > > Can someone tell me how to control sample size (n) in
   > > bootstrap function
   > > boot in R. Can we give some option like we give for #
   > > of repeated
   > > samples(R=say 100).
   > >
   > > Will appreciate any help.
   > >
   > > thanks
   >
   > I don't believe so. Isn't one of the differences between the bootstrap and
   other kinds of
   > resampling that the bootstrap samples with replacement a sample of the
   same size as the
   > original data? You could use the function sample() to select your subsets
   and compute your
   > statistics of interest.
   >
   > Hope this is helpful,
   >
   > Dan
   >
   > Daniel J. Nordlund
   > Research and Data Analysis
   > Washington State Department of Social and! Health Services
   > Olympia, WA 98504-5204
   >
   &g t;
   >
#
Hello all,
I know I'm not making friends with this, but: I absolutely see the point 
in dual-(or more!)-y-axis plots! I find them quite informative, and I 
see them often. In Earth-Sciences (and I very generously include 
atmospheric sciences here, as Johannes has given an example of a 
meteorological plot...) very often time-series plots of some values are 
given rather to show the temporal correlation of these, than to show the 
actual numerical values! The same applies for plots of some sample 
values over distance (eg. element concentration over a sample or 
investigation area). In this case one is more interested in whether some 
values change simultaneously, than what the actual values at every point 
are.

In the mentioned plot (see link below), the temporal  evolution of the 
mean temperature and of the precipitation over a year is the important 
information. No-one would get confused or yield wrong conclusions, if 
the curves would intersect somewhere else, only because of a shift of 
one y-axis relative to the other!? (which was proposed to be one of the 
great dangers of dual-scaled axes in the article Hadley posted)

On the other hand, you would never express temperature in terms of a 
percentage of some arbitrary start value, if you could give it just in 
plain ?C!? (as was proposed as a workaround in the article mentioned) An 
awkward scale like this makes the actual graph much harder to read, not 
easier, as proposed. Furthermore, since the observed values in Earth 
Sciences often show a cyclic behavior, the graphs would still cross each 
other over and over again, no matter what the scale was.

So my conclusion for now: I'd answer the Question "are dual-scaled axes 
in graphs ever the best solution?" with a definitive YES. Maybe only in 
some specialized applications, but - yes. I strongly expect this 
discussion to go on (as I've read frequently here that these kind of 
graphs are considered very "inappropriate"..) and I am happy to learn to 
do better graphs, if you can show me to be wrong...

Greetings,
Martin
Johannes H?sing wrote: