Skip to content

Impaired boxplot functionality - mean instead of median

12 messages · Evgeniy Kachalin, Martin Maechler, Jean-Christophe BOUETTE +6 more

#
Hello to all users and wizards.

I am regulary using 'boxplot' function or its analogue - 'bwplot' from 
the 'lattice' library. But they are, as far as I understand, totally 
flawed in functionality: they miss ability to select what they would 
draw 'in the middle' - median, mean. What the box means - standard 
error, 90% or something else. What the whiskers mean - 100%, 99% or 
something else.
Is there any way to realize it? Or is there any other good data 
visualization function for comparing means of various data groups? 
Ideally I would like to have a bit more customised function for doing 
that. For example, 'boxplot(a~b,data=d,mid='mean').
#
Boxplots were invented by John W. Tukey and I think should be
counted among the top "small but smart" achievements from the
20th century.  Very wisely he did *not* use mean and standard deviations.

Even though it's possible to draw boxplots that are not boxplots
(and people only recently explained how to do this with R on this
 mailing list), I'm arguing very strongly against this.

If I see a boxplot - I'd want it to be a boxplot and not have
the silly (please excuse)  10%--------90% whiskers  which
declare 20% of the points as outliers {in the boxplot sense}.

If you want the mean +/- sd plot, do *not* misuse boxplots
for them, please! 

Martin Maechler, ETH Zurich
Evgeniy> Hello to all users and wizards.
    Evgeniy> I am regulary using 'boxplot' function or its analogue - 'bwplot' from 
    Evgeniy> the 'lattice' library. 

 [there's the lattice *package*  !]

    Evgeniy> But they are, as far as I understand, totally 
    Evgeniy> flawed in functionality: they miss ability to select what they would 
    Evgeniy> draw 'in the middle' - median, mean. What the box means - standard 
    Evgeniy> error, 90% or something else. What the whiskers mean - 100%, 99% or 
    Evgeniy> something else.
    Evgeniy> Is there any way to realize it? Or is there any other good data 
    Evgeniy> visualization function for comparing means of various data groups? 
    Evgeniy> Ideally I would like to have a bit more customised function for doing 
    Evgeniy> that. For example, 'boxplot(a~b,data=d,mid='mean').


    Evgeniy> -- 
    Evgeniy> Evgeniy, ICQ 38317310.
#
I'm no wizard but looking at ?boxplot I think you should try ?bxp.

HTH,
Jean-Christophe.

2005/12/1, Evgeniy Kachalin <ka4alin at yandex.ru>:
#
Martin Maechler ??????????:
So I analize genetics data. I have some factor (gene variant, c(1,2,3))
and the quantitative variable corresponding to that factor. How do I
visualize this situation? Compare mean of samples corresponding to
factor values?

Should boxplot support 'mean-in-the-middle', it would fit my needs
ideally. How do I plot mean +/- SD plot?

Also there is a way to rewrite boxplot.stats and replace "fivenum" there
for self-made function. Then I would need to write self-made
boxplot.formula (or boxplot.default?) function. And all this stuff would
not be configurable. I'm still novice in R, so I need simple way to
pre-visualize my data and estimate approximate result.
#
I'd like to add two comments to Martin's sensible response.

1. I've seen several intro-stats textbooks that define a
boxplot to have whiskers to the extreme data values
and then define Tukey's boxplot as a "modified" boxplot.
I wish authors wouldn't do that.

2. I've also seen boxplots used for sample sizes as small
as -- are you ready for it? -- n = 2!! (Admittedly, only in
plots comparing several groups.) The help page for
stripchart() points out that stripcharts "are a good
alternative to boxplots when sample sizes are small".
My own rule-of-thumb: n > 20 for single boxplots, n > 12
for multiple boxplots.

Peter Ehlers
Martin Maechler wrote:

            
#
On 12/1/05, Evgeniy Kachalin <ka4alin at yandex.ru> wrote:
Not sure exactly what you want but perhaps thermometer plots
would help?

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=129
#
On Thu, 2005-12-01 at 19:40 +0300, Evgeniy Kachalin wrote:
If you want means and SDs, you might want to look at:

1. plotCI() and plotmeans() in the gplots package

2. errbar() in the Hmisc package

3. Use plot() in conjunction with the arrows() or segments() functions,
which is what the above end up doing in a convenient and unified
approach.

HTH,

Marc Schwartz
#
All--

Would someone kindly post the reference to Tukey's formula for a boxplot 
without whiskers?

I am looking at his book "Exploratory Data Analysis" from 1977.  The 
index includes "box-and-whisker" plot but not "boxplot."  On page 39-40 
construction of the plot is described, including the statements: "We 
draw a long, thinnish box that stretches from hinge to hinge, crossing 
it with a bar at the median.  Then we draw a 'whisker' from each end of 
the box to the corresponding extreme."

MHP


on 12/1/2005 11:57 AM P Ehlers said the following:

  
    
#
It's already there in EDA! Se pp 39-47.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
#
Marc Schwartz (via MN) ??????????:
So plotmeans is incapable of: boxplot(numerical~fact1+fact2). Is there 
any way further?
#
P Ehlers wrote:
Woul've it make sense to have an option to replace boxes with dotplots
for only those groups with number of observations lesser tahn nmin=20 (say)

Kjetil
#
Kjetil Brinchmann Halvorsen wrote:
[snip]

Probably best just to leave it up to the user.

Peter Ehlers