Skip to content

box and whisker (PR#13821)

3 messages · m.crawley at imperial.ac.uk, Peter Dalgaard, Ben Bolker

#
In a Box and Whisker plot, I thought that when there are outliers both abov=
e and below the whiskers, then the whiskers should both be the same length =
(plus or minus 1.5 times the inter-quartile range).

If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
see that for November (month =3D 11) the upper whisker is shorter than the =
lower, while for other months with outliers both above and below, the lines=
 are the same lengths.

When there are no outliers, then of course the maximum and minimum values r=
esult in whiskers of different lengths above and below.

Here is the code that creates the problem:

data<-read.table("c:\\temp\\SilwoodWeather.txt",header=3DT)
attach(data)
names(data)

[1] "upper" "lower" "rain"  "month" "yr"

month<-factor(month)
plot(month,upper)

and exactly the same with

boxplot(upper~month)



Best wishes,

Mick

Prof  M.J. Crawley  FRS

Imperial College London
Silwood Park
Ascot
Berks
SL5 7PY
UK

Phone (0) 207 5942 216
Fax     (0) 207 5942 339
#
m.crawley at imperial.ac.uk wrote:
Not according to the docs:

    range: this determines how far the plot whiskers extend out from the
           box.  If 'range' is positive, the whiskers extend to the most
           extreme data point which is no more than 'range' times the
           interquartile range from the box. A value of zero causes the
           whiskers to extend to the data extremes.

And the code itself has

             stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)

So the whisker won't be equal to 1.5 IQR unless there happens to be an 
observation there.

Now, this might be wrong, but people have tried very hard to make the 
implementation follow the original definition due to Tukey. I.e., if you 
can point out that Tukey specified it otherwise, then we'd change it, 
otherwise it is just not a bug.
For easier reproduction (reproducible examples should not refer to files 
on your C: drive...):

 > diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
           [,1]
[1,] 1.2525857
[2,] 0.5412128
[3,] 0.6083348
[4,] 1.4625057
#
Peter Dalgaard wrote:
For what it's worth, the data are available at
http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/SilwoodWeather.txt

Not that that's really necessary since as you've shown the point is pretty
general.

  Ben Bolker