Skip to content

boxplot - code for labeling outliers - any suggestions for improvements?

7 messages · Kevin Wright, Greg Snow, Jim Lemon +1 more

#
For the last point (cluttered text), look at spread.labels in the plotrix package and spread.labs in the TeachingDemos package (I favor the later, but could be slightly biased as well).  Doing more than what those 2 functions do becomes really complicated really fast.
#
My colleagues that use one of the .Net languages/libraries can make
scatter plots that look better than R's because they have better
spreading of the labels.

If someone could spread this labels on the following graph, I would be
impressed.

plot(Sepal.Length~Sepal.Width, data=iris)
with(iris,text(Sepal.Width, Sepal.Length, 1:nrow(iris), cex=.5))

Kevin
On Thu, Jan 27, 2011 at 9:52 AM, Tal Galili <tal.galili at gmail.com> wrote:

  
    
#
Try:

library(TeachingDemos)

plot(Sepal.Length~Sepal.Width, data=iris)

tmp.y <- iris$Sepal.Length
for( i in unique(iris$Sepal.Width) ) {
	tmp <- iris$Sepal.Width == i
	tmp.y[ tmp ] <- spread.labs( tmp.y[tmp], .6*strheight('A'),
		maxiter=1000 )
}

# optional
with(iris, segments(Sepal.Width, Sepal.Length, Sepal.Width+0.025, tmp.y) )

with(iris, text(Sepal.Width+0.05, tmp.y, seq_along(tmp.y), cex=.5 ) )


There is also thigmophobe.labels in the plotrix package which is simpler and works well for some plots

Also look at dynIdentify (Windows only) and TkIdentify (all platforms) in the TeachingDemosp package for a way to interactively place the labels (little more work, but labels end up where you think they look best).

I have experimented with spreading simultaneously in 2 directions, but what works well for one case does lousy in another and what ends up working for the other doesn't work in the first case.

But I would argue against labeling all the points in a plot of that many points, they make it too busy and distract more than help.  HWidentify (windows) and HTKidentify (all platforms) in TeachingDemos give another option.  Sometimes just using different colors/symbols/etc. for groups of points gives more useful information than labels.

Hope this helps,
#
On 01/28/2011 07:57 AM, Greg Snow wrote:
Alas, I tried thigmophobe.labels and there are just too many points.
The best I could do was this:

irisxy<-cluster.overplot(iris$Sepal.Width,iris$Sepal.Length)
plot(irisxy)
text(irisxy$x,irisxy$y-0.04,labels=1:150,cex=0.5)

which, sad to say, ain't too good.

Jim