Skip to content
Prev 310027 / 398506 Next

Violin plot of categorical/binned data

On 11/3/2012 5:47 PM, Jim Lemon wrote:
Expanding on the idea of the battleship.plot, you can draw rectangles of 
the right width with ggplot2 if you want.

Original data:

data2 <- read.csv(text=
"count, bin
7,0
11,1-10
6,11-100
13,101-1000
7,1001-10000
3,10001-100000
2,100001-1000000")
data2$bin <- ordered(data2$bin, 
levels=c("0","1-10","11-100","101-1000","1001-10000","10001-100000","100001-1000000"))

Define the lower and upper reaches of each bin:

data2$low <- c(0,1,11,101,1001,10001,100001)
data2$high <- c(0,10,100,1000,10000,100000,1000000)

And make multiple ones for different vessels (or whatever grouping):

data3 <- rbind(data2, data2, data2, data2)
data3$vessel <- rep(c("Barnacle","Maelstrom","Poopdeck","Seasick"),
                     each=7)
data3$count <- abs(data2$count + sample(-5:5, 7*4, replace=TRUE))

With each bin taking the same size, regardless of its extent:

ggplot(data3) +
   geom_blank(aes(x=count/2, y=bin)) +
   geom_rect(aes(ymin=as.numeric(bin)-0.5, ymax=as.numeric(bin)+0.5,
                 xmin = -count/2, xmax = count/2)) +
   facet_grid(~vessel)

Width (height, really) of rectangles is based on range. Since 
logarithmic scale and exponential binning, rectangles are same height 
(with some gaps due to discrete nature). Since log scale, still problems 
with 0.

ggplot(data3) +
   geom_rect(aes(ymin=low, ymax=high, xmin=-count/2, xmax=count/2)) +
   facet_grid(~vessel) +
   scale_y_log10()