Skip to content

R-beta: CI for median in funtion boxplot

3 messages · Martin Maechler, Bill Venables, (Ted Harding)

#
PD> Rick White <rick at stat.ubc.ca> writes:
    >>  I noticed that boxplot computes a 95% CI for the median by using
    >> median +/- 1.58*IQR./sqrt(n)
    >> 
    >> Where does the 1.58 constant come from?
    >> 

    PD> Search me... However, wouldn't it be better in any case to do an
    PD> exact 95% CI based on the binomial distribution? Of course, you
    PD> need at least 6 observations to do that.

No, please not yet another definition of the boxplot!
People looking at boxplots should be able to rely on their knowledge of
what a boxplot is.

I don't know the exact history; in any case,
John Tukey devised the boxplot, including the notches, 
and ``1.58 is THE number''.

A very accessible reference  on how  1.58  was construed is
Section 3.12, p.79--81 of
@Book{VelPH81,
  author = 	{Paul F. Velleman and David C. Hoaglin},
  title = 	{Applications, Basics, and Computing of Exploratory
		  Data Analysis},
  publisher = 	{Duxbury Press, Boston, Massachusetts},
  year = 	1981
}

Here a ``compact'' summary  (if you really want to know ...)

Comparing two normal populations, there are two extreme cases: 
In the first one, the variances are about equal, 
in the other, one variance is much higher than the other.
The corresponding z-Tests are 
	abs(mean(x1) - mean(x2)) - 1.96 sqrt(2) sigma_xbar
and
	abs(mean(x1) - mean(x2)) - 1.96         sigma_xbar (the big one).

Where the first corresponds to a CI of  
	mean(x) +/- 1.96 sqrt(2) / 2 sigma_xbar =
    =	mean(x) +/- 1.39 sigma_xbar 
the second one must have
	mean(x1) +/- 1.96 sigma_xbar(x1) and the same for x2.

An omnibus compromise factor is  (1.39 + 1.96) / 2 ~= 1.7
[``exact'' would be   qnorm(.975)*(1 + sqrt(2)/2)/2 = 1.672934].

Now, we also have  
	sigma = 1.349 * IQR,  [[exact:  2*qnorm(3/4)  * IQR ]]
and
	var(median) = pi/2 * var(arith.mean)

The three things put together:

	"notch length" =  (IQR/1.349) * sqrt(pi/2) * (1.7 / sqrt(n)) =
		       =  1.58  * IQR / sqrt(n),

i.e. 1.58 = sqrt(pi/2)*1.7/1.349  (= 1.579417)

Instead, the ``exact'' value for 1.58 would be

1/(2*qnorm(.75))* sqrt(pi/2) * (qnorm(.975)*(1 + sqrt(2)/2)/2) =  1.554295

---
So, 1.58 ``should be'' 1.554 instead, 
but of course, the big deal is the compromise of the two extreme
situations, anyway.  Rounding up leads to the slightly increased factor
which may be somewhat more realistic for long-tailed nonnormal situations.

----------
PS: Should the above go into the online documentation?

Martin Maechler <maechler at stat.math.ethz.ch>			<><
Seminar fuer Statistik, ETH-Zentrum SOL G1;	Sonneggstr.33
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1086
http://www.stat.math.ethz.ch/~maechler/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Martin Maechler writes:

 > No, please not yet another definition of the boxplot!  People
 > looking at boxplots should be able to rely on their knowledge
 > of what a boxplot is.

Well you won't get one from me, Martin, but I fear you are too
late to stop that one. 

But why do people still look at boxplots so much, and why do they
spend so much time and technology on them?  It really puzzles me.
With all the good work done in recent years on bandwidth
selection, why don't we communicate this kind of information,
between ourselves at least, with kernel density estimates more?
They could be embellished with a few reference percentiles, of
course, but to me they convey a whole lot more than boxplots.  It
can't be very long before, like pie charts, boxplots are reserved
for politicians, advertising agents and others of their kind.

 > I don't know the exact history; in any case, John Tukey
 > devised the boxplot, including the notches, and ``1.58 is THE
 > number''.

...

 > ----------
 > PS: Should the above go into the online documentation?

Purely in the interests of keeping it short, I suggest not.
#
On 06-Apr-98 Bill Venables wrote:
Like many early Tukey schemes, the boxplot was devised in days when most people
only had line-printer output readily available. If you think about it, the
boxplot can be drawn (to within character-width resolution) using basic ASCII
symbols like - + | on a lineprinter/dumb terminal. "Experimental Data Analysis
was written in 1976! The same is true for stem-and-leaf diagrams and the like.
And very effective, too, for that medium.

Tukey's graphical tricks were, for many people, the first experience they had
of the comprehensible display of statistical summary data and I remember the
furious eagerness with which they were accepted in groups whom more academic
statisticians had previously regarded as, for practical purposes, out of reach;
and tribute must be paid to pioneering teachers like the late Cathie Marsh who
saw these methods as the bridge by which true understanding of data could be
brought to them.

As to why people still look at them, well, <cynical>old habits die hard, and
something which works to your satisfaction is a better ploy than adapting to
something new which may require unknown effort which you are not sure might be
well spent</cynical>, but Bill's point, essentially that modern display
technology can greatly improve on this without stressing the learner, seems
good. (I still wonder a bit, though, whether someone used to a boxplot would
readily derive the same information from a kernel density estimate without
considerable practice).

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Date: 06-Apr-98                                       Time: 14:14:45
--------------------------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._