Skip to content

boxplot by factor (Package base version 2.1.1) ( PR#7976)

3 messages · Liaw, Andy, Peter Dalgaard, Gabor Grothendieck

#
The issue is not with boxplot, but with split.  boxplot.formula() 
calls boxplot(split(split(mf[[response]], mf[-response]), ...), 
but look at what split() returns when there are empty levels in
the factor:
$"1"
[1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520

$"2"
[1] -1.1296642 -0.4808355 -0.2789933  0.1220718  0.1287742 -0.7573801

$"3"
[1]  1.2320902  0.5090700 -1.5508074  2.1373780  1.1681297 -0.7151561

The "culprit" is the following in split.default():

    f <- factor(f)

which drops empty levels in f, if there are any.  BTW, ?split doesn't
mention what it does in such situation.  Perhaps it should?

If this is to be "fixed", I suppose an additional argument, e.g.,
drop=TRUE, can be added, and the corresponding line mentioned
above changed to something like:

    if (drop || !is.factor(f)) f <- factor(f)

Then this additional argument can be pass on from boxplot.formula() to 
split().

Just my $0.02...

Andy
#
"Liaw, Andy" <andy_liaw at merck.com> writes:
Alternatively, I suspect that the intention was as.factor() rather
than factor(). It does require a bit of care to fix it that way,
though. There could be problems with empty levels popping up in
unexpected places.
#
Based on Andy's comment a workaround can consist of
not using boxplot.formula, e.g. using the data frame d
defined by the original poster (see below):

	boxplot( by(d, d$b, function(x)x$a) )
On 6/28/05, Liaw, Andy <andy_liaw at merck.com> wrote: