In a Box and Whisker plot, I thought that when there are outliers both abov=
e and below the whiskers, then the whiskers should both be the same length =
(plus or minus 1.5 times the inter-quartile range).
If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
see that for November (month =3D 11) the upper whisker is shorter than the =
lower, while for other months with outliers both above and below, the lines=
are the same lengths.
When there are no outliers, then of course the maximum and minimum values r=
esult in whiskers of different lengths above and below.
Here is the code that creates the problem:
data<-read.table("c:\\temp\\SilwoodWeather.txt",header=3DT)
attach(data)
names(data)
[1] "upper" "lower" "rain" "month" "yr"
month<-factor(month)
plot(month,upper)
and exactly the same with
boxplot(upper~month)
Best wishes,
Mick
Prof M.J. Crawley FRS
Imperial College London
Silwood Park
Ascot
Berks
SL5 7PY
UK
Phone (0) 207 5942 216
Fax (0) 207 5942 339
box and whisker (PR#13821)
3 messages · m.crawley at imperial.ac.uk, Peter Dalgaard, Ben Bolker
m.crawley at imperial.ac.uk wrote:
In a Box and Whisker plot, I thought that when there are outliers both abov= e and below the whiskers, then the whiskers should both be the same length = (plus or minus 1.5 times the inter-quartile range).
Not according to the docs:
range: this determines how far the plot whiskers extend out from the
box. If 'range' is positive, the whiskers extend to the most
extreme data point which is no more than 'range' times the
interquartile range from the box. A value of zero causes the
whiskers to extend to the data extremes.
And the code itself has
stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
So the whisker won't be equal to 1.5 IQR unless there happens to be an
observation there.
Now, this might be wrong, but people have tried very hard to make the
implementation follow the original definition due to Tukey. I.e., if you
can point out that Tukey specified it otherwise, then we'd change it,
otherwise it is just not a bug.
If you look at the plot for SilwoodWeather on p.155 of The R Book you will = see that for November (month =3D 11) the upper whisker is shorter than the = lower, while for other months with outliers both above and below, the lines= are the same lengths.
For easier reproduction (reproducible examples should not refer to files
on your C: drive...):
> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
[,1]
[1,] 1.2525857
[2,] 0.5412128
[3,] 0.6083348
[4,] 1.4625057
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Peter Dalgaard wrote:
For easier reproduction (reproducible examples should not refer to files on your C: drive...):
For what it's worth, the data are available at http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/SilwoodWeather.txt Not that that's really necessary since as you've shown the point is pretty general. Ben Bolker
View this message in context: http://www.nabble.com/box-and-whisker-%28PR-13821%29-tp24446795p24453549.html Sent from the R devel mailing list archive at Nabble.com.