Whiskers on the default boxplot {graphics}
Jason, All these are clearly defined in the help file for 'boxplot' under 'range'. Don't understand how you missed that. ...Tao ----- Original Message ----
From: Jason Rupert <jasonkrupert at yahoo.com>
To: Dennis Murphy <djmuser at gmail.com>
Cc: R Project Help <R-help at r-project.org>
Sent: Wed, May 12, 2010 3:40:12 AM
Subject: Re: [R] Whiskers on the default boxplot {graphics}
Fantastic!
It would be great if the description could be modified to
include the mysterious bit about the upper and lower bound whisker positions:
upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower whisker
= max(min(x), Q_1 - 1.5 * IQR)
Maybe that is clearly written in the
description of boxplot.stats {grDevices}, but evidently I missed it numerous
times and also did not pick up on this intent from the original description of
boxplot {graphics}.
Your type of descriptive answer and
helpfulness is much appreciated and one of the reasons I continue to endorse the R tool over numerous others.
More like you and the tool may be
headed for domination in the market.
Thanks
again!
________________________________ From:
Dennis Murphy < href="mailto:djmuser at gmail.com">djmuser at gmail.com>
Cc: R Project
Help < href="mailto:R-help at r-project.org">R-help at r-project.org>
Sent: Wed,
May 12, 2010 2:50:19 AM
Subject: Re: [R] Whiskers on the default boxplot
{graphics}
Hi: Let's do some math
:)
e: Okay...Let me see if I've got
it...
I'm just trying to use the default boxplot {graphics}
capability in R...
So I call something like the
following:
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data",
xlab="Number of Cylinders", ylab="Miles Per Gallon") \
That
produces something as shown in the following: http://www.statmethods.net/graphs/images/boxplot1.jpg
When
that default boxplot is called, i.e. boxplot {graphics}, as shown in the line of
code above, it is actually calling into boxplot.stats {grDevices}. When
boxplot.stats {grDevices} is called it has a default value for "coef" of 1.5,
i.e. coef = 1.5.
If I understand the purpose of "coef"
correctly, it means that the ?whiskers? should extend out 1.5 times the length of the box away from the box. Is that correct?
If by
'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q refers to quartile), then assuming that
x is the numeric vector of interest
for a boxplot,
upper whisker = min(max(x), Q_3 + 1.5 * IQR) lower
whisker = max(min(x), Q_1 - 1.5 * IQR)
So the upper whisker is located at
the *smaller* of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower
whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 IQR.
In your terms, the whiskers should extend out a *maximum* of "1.5
times the length of the box
away from the box". Visually, this means
that individual points more extreme in value than Q3 + 1.5 IQR are plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted
separately on the low
end. Depending on the source, the separately plotted
points are called 'outside values'. On
the other hand, if the maximum or
minimum values of x are closer than 1.5 IQR in distance from
its nearest
quartile, then that is where the whisker is positioned.
Does that make
sense?
HTH, Dennis
Now I look back at the plot, and
I'm not sure how 1.5 times the length of the box corresponds with the whisker lengths shown in the image: href="http://www.statmethods.net/graphs/images/boxplot1.jpg" target=_blank
Is
it that the whisker length is a total of 1.5 the length of the box and centered about the median (2nd Quartile)?
Just trying to get a handle
on this, so thanks again for all the help in deciphering this.
________________________________ From:
RJ Cunningham < href="mailto:robut at iinet.net.au">robut at iinet.net.au> target="_blank" href="http://ast.net">ast.net>
Cc: R Project
Help < href="mailto:R-help at r-project.org">R-help at r-project.org>
Sent:
Tue, May 11, 2010 9:57:48 PM
Subject: Re: [R] Whiskers on the
default boxplot {graphics}
I think not. Isn't the
"secret" here?
Arguments:
x: a
numeric vector for which the boxplot will be constructed
('NA's and
'NaN's are allowed and omitted).
coef: this determines how
far the plot 'whiskers' extend out
from the box. If 'coef' is
positive, the whiskers extend
to the most extreme data point which is
no more than
'coef' times the length of the box away from the box.
A
value of zero causes the whiskers to extend to the
data
extremes (and no outliers be
returned).
do.conf,do.out: logicals; if 'FALSE', the 'conf'
or 'out'
component respectively will be empty in the
result.
Details:
The two 'hinges' are
versions of the first and third quartile,...
On Wed
May 12 10:35 , Jason Rupert sent:
Humm....Maybe
I need to look some place else than boxplot.stats {grDevices} for a definition
of how the upper/lower whiskers are
produced.
By any chance are
they "the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile"?
None of the links
from boxplot.stats {grDevices} seemed to reveal the secret definition of the R
whiskers.
Thanks
again.
-----
Original Message ----
To:
David Winsemius < href="mailto:dwinsemius at comcast.net">dwinsemius at comcast.net>
Cc:
R Project Help < href="mailto:R-help at r-project.org">R-help at r-project.org>
Sent:
Tue, May 11, 2010 9:26:25 PM
Subject: Re: [R]
Whiskers on the default boxplot
{graphics}
Wowzers...
From
?boxplot.stats:
Details
The
two ?hinges? are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise.
The notches
(if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians.
Is
a notch equal to the upper/lower whisker? Is this just a difference of terminology or something?
Thanks again for
all the insights.
-----
Original Message ----
From: David
Winsemius < href="mailto:dwinsemius at comcast.net">dwinsemius at comcast.net>
Cc:
R Project Help < href="mailto:R-help at r-project.org">R-help at r-project.org>
Sent:
Tue, May 11, 2010 9:00:15 PM
Subject: Re: [R]
Whiskers on the default boxplot
{graphics}
On
May 11, 2010, at 9:45 PM, Jason Rupert wrote:
How are the
lower/upper whiskers defined in the default version of boxplot
{graphics}?
I tried help(boxplot) and searching href="http://www.rseek.org">www.rseek.org, but I was unable to determine an absolute answer.
You need
to follow the links from the help pages and tin this case it appears that you did not follow the one to
?boxplot.stats
I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot%5C), but it also had several approaches
listed for how the
whiskers could be determined, so I'm just curious how the default
boxplot {graphics} does
it.
Thanks for any feedback
Follow links with
the R help system.
and
insights.
David
Winsemius, MD
West Hartford,
CT
______________________________________________
ymailto="mailto:R-help at r-project.org" href="mailto:R-help at r-project.org">R-help at r-project.org mailing list
target=_blank
do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ ymailto="mailto:R-help at r-project.org" href="mailto:R-help at r-project.org">R-help at r-project.org mailing list target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide target=_blank http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ ymailto="mailto:R-help at r-project.org" href="mailto:R-help at r-project.org">R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide target=_blank >http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]