help understanding box plots
Jay Pfaffman <pfaffman at relaxpc.com> writes:
Another naive stats question. I'm trying to better understand what boxplots are telling me. I think what I see is the median and the boundaries of the 1st and 3rd quartiles. The whiskers represent the range of the data unless there are points which are outside "range" (default: 1.5) times the distance from the median to that quartile. Is that right?
Not quite. 1.5 times the length of the entire box.
I've read the documentation for boxplot numerous times, but don't quite understand it well enough to communicate it to my professor who's helping me with this project. (You'll be relieved to know that neither of us fancies ourself a statistician!)
boxplot.stats.Rd had a typo and got updated recently in the
development and patch versions to read
\item{coef}{this determines how far the plot ``whiskers'' extend out
from the box. If \code{coef} is positive, the whiskers extend to
the
most extreme data point which is no more than \code{coef} times
the length of the box away from the box. A value of zero causes
the whiskers
to extend to the data extremes (and no outliers be returned).}
(for some reason this hasn't yet found its way to the online snapshot
manuals in http://stat.ethz.ch/R-alpha/R-devel/doc/html/ and friends.
Martin?)
V&R (p. 122) claims that the hinges are "roughly quartiles," so perhaps my naive understanding is close enough.
Yes. The exact definition is slightly peculiar, but in compliance with the original definition by Tukey. So I'm told, anyway.
I've got a relatively small data set (n~=12). I think it would help
to see the data points plotted on top of the boxplots. Here's what
I'm doing now:
par(las=2,ps=14,mar=c(15, 4, 4, 2))
boxplot(split(ranks,c(1:25)), names=items, notch=T, horizontal=F, add=F)
If I could get the points of each of the 25 variables plotted on top
of the box, that'd be great.
Not sure what you're doing there, but maybe some code like this could help: x1<-rnorm(20) x2<-rnorm(20) boxplot(list(x1=x1,x2=x2)) points(cbind(1,x1)) points(cbind(2,x2))
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._