Skip to content

Labeling a range of bars in barplot?

4 messages · Marc Schwartz (via MN), Dan Bolser, Gabor Grothendieck

#
Hi, I am plotting a distribution of (ordered) values as a barplot. I 
would like to label groups of bars together to highlight aspects of the 
distribution. The label for the group should be the range of values in 
those bars.

As this is hard to describe, here is an example;


x <- rlnorm(50)*2

barplot(sort(x,decreasing=T))

y <- quantile(x, seq(0, 1, 0.2))

y

plot(diff(y))



That last plot is to highlight that I want to label lots of the small 
columns together, and have a few more labels for the bigger columns 
(more densely labeled). I guess I will have to turn out my own labels 
using low level plotting functions, but I am stumped as to how to 
perform the calculation for label placement.

I imagine drawing several line segments, one for each group of bars to 
be labeled together, and putting the range under each line segment as 
the label. Each line segment will sit under the group of bars that it 
covers.

Thanks for any help with the above!

Cheers,
Dan.
#
On Tue, 2005-12-13 at 10:53 +0000, Dan Bolser wrote:
Dan,

Here is a hint.

barplot() returns the bar midpoints:

mp <- barplot(sort(x, decreasing = TRUE))
[,1]
[1,]  0.7
[2,]  1.9
[3,]  3.1
[4,]  4.3
[5,]  5.5
[6,]  6.7

There will be one value in 'mp' for each bar in your series.

You can then use those values along the x axis to draw your line
segments under the bars as you require, based upon the cut points you
want to highlight.

To get the center of a given group of bars, you can use:

  mean(mp[start:end])

where 'start' and 'end' are the extreme bars in each of your groups.

Two other things that might be helpful. See ?cut and ?hist, noting the
output in the latter when 'plot = FALSE'.

HTH,

Marc Schwartz
#
Marc Schwartz (via MN) wrote:
Thanks all for help on this question, including those who emailed me off 
list.

I went with the suggestion of Marc above, because I could follow through 
how to implement the code (other more complete solutions were hard for 
me to 'reverse engineer').

Here is my solution in full, which I feel gives rather nice output :)

## Approximate my data for you to try
x <- sort((runif(70)*100)^3,decreasing=T)

## Plot the barplot
mp <-
   barplot(x,
           # Remove default label names
           names.arg=rep('',70)
           )

## Break data range, and count bars per break
my.hist <-
   hist(x,plot=F,
        ## Pick the (approximate) number of labels
        ## NB: using quantiles is incorrect here
        breaks=4
        )

## Check for sanity
## points(mp[length(mp)],x[length(mp)],col=2)

## Counts become new 'breaks'
my.new.breaks <-
   my.hist$counts

## Some formating stuff
my.names <-
   sprintf("%.1d",my.hist$breaks)

# Prepare to add labels
op<-par(xpd=TRUE)

i <- length(mp)             # Note we label from right to left
q <- 1
#
for(j in my.new.breaks){
   st <- i                   #
   en <- i-j+1               #
   ##
   segments(mp[st],-50000,
            mp[en],-50000,lwd=2,col=2)
   ##
   text(mean(mp[st:en]),-100000,pos=1,
        paste(paste(my.names[q],"-",sep=" "),
              my.names[q+1],sep="\n"),cex=0.6)
   ##
   i <- i-j                  #
   q <- q+1
}


You should see that the density of labels corresponds to the range of 
data (hopefully not too dense), giving more labels to regions of the 
plot with bigger ranges.
Cheers,
Dan.
#
Note that if I follow this correctly then you could remove the loop.   In
particular note that 1. st is just the cumulative sum of new.break.points
but summed from the end:

   st <- rev(cumsum(rev(my.new.breaks)))

2. segments and text both take vector arguments and 3. averaging over the
groups can be done by defining a factor g whose levels are the groups
using cut and then performing the averaging with tapply:

   g <- cut(seq(mp), c(1, st.), include.lowest = TRUE)
   tapply(mp, g, mean)
On 12/14/05, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote: