Skip to content

Bumps chart in R

12 messages · Gabor Grothendieck, Michael Lawrence, Andreas Christoffersen +2 more

#
Hi there,

I would like to make a 'bumps chart' like the ones described e.g.
here: http://junkcharts.typepad.com/junk_charts/bumps_chart/

Purpose: I'd like to plot the proportion of people in select countries
living for less then one USD pr day in 1994 and 2004 respectively. I
have already constructed a barplot - but I think a bumps chart would
be better

# The barplot and data
countries <- c("U-lande", "Afrika syd for sahara", "Europa og
Centralasien", "Lantinamerika og Caribien","Mellem?stenog Nordafrika",
"Sydasien","?Stasien og stillehaveet", "Kina", "Brasilien")
poor_1990 <- c(28.7,46.7,0.5,10.2,2.3,43,29.8,33,14)
poor_2004 <- c(18.1,41.1,0.9,8.6,1.5,30.8,9.1,9.9,7.5)
poor <- cbind(poor_1990,poor_2004)
rownames(poor) <- countries
oldpar <- par(no.readonly=T)
par <- par(mar=c(15,5,5,1))
png("poor.png")
par <- par(mar=c(15,5,5,1))
barplot(t(poor[order(poor[,2]),]),beside=T,col=c(1,2),las=3,ylab="%
poor",main="Percent living for < 1 USD per day (1993
prices)",ylim=c(0,50))
legend("topleft",c("1990","2004"),fill=c(1,2),bty="n")
par(oldpar)
dev.off()

I Guess I need to start with an normal plot? Something like the below
- but there is a loong way to go...

# A meager start - how to finish my bumps chart
plot(c(rep(1,9),rep(2,9)),c(fattig_1990,fattig_2004),type="b",ann=F)

Thankfull for any help.

Cheers.

Andreas
#
In statistics, a bumps chart is more commonly called a parallel
coordinates plot.

Hadley

On Sun, Apr 26, 2009 at 5:45 PM, Andreas Christoffersen
<achristoffersen at gmail.com> wrote:

  
    
#
Have a look at plotweb in the bipartite package.

On Sun, Apr 26, 2009 at 6:45 PM, Andreas Christoffersen
<achristoffersen at gmail.com> wrote:
#
Here's a ggplot2 based solution:

#load the ggplot2 library
library(ggplot2)

#here's the data provided by Andreas
countries <- c("U-lande", "Afrika syd for sahara", "Europa og
Centralasien", "Lantinamerika og Caribien","Mellem?stenog
Nordafrika","Sydasien","?Stasien og stillehaveet", "Kina",
"Brasilien")
poor_1990 <- c(28.7,46.7,0.5,10.2,2.3,43,29.8,33,14)
poor_2004 <- c(18.1,41.1,0.9,8.6,1.5,30.8,9.1,9.9,7.5)

#reformat the data
data = data.frame(countries,poor_1990,poor_2004)
data = melt(data,id=c('countries'),variable_name='year')
levels(data$year) = c('1990','2004')

#make a new column to make the text justification easier
data$hjust = 1-(as.numeric(data$year)-1)

#start the percentage plot
p = ggplot(
	data
	,aes(
		x=year
		,y=value
		,groups=countries
	)
)

#add the axis labels
p = p + labs(
	x = '\nYear'
	, y = '%\n'
)

#add lines
p = p + geom_line()

#add the text
p = p + geom_text(
	aes(
		label=countries
		, hjust = hjust
	)
)

#expand the axis to fit the text
p = p + scale_x_discrete(
	expand=c(2,2)
)

#show the plot
print(p)


#rank the countries
data$rank = NA
data$rank[data$year=='1990'] = rank(data$value[data$year=='1990'])
data$rank[data$year=='2004'] = rank(data$value[data$year=='2004'])

#start the rank plot
r = ggplot(
	data
	,aes(
		x=year
		,y=rank
		,groups=countries
	)
)

#add the axis labels
r = r + labs(
	x = '\nYear'
	, y = 'Rank\n'
)

#add the lines
r = r + geom_line()

#add the text
r = r + geom_text(
	aes(
		label=countries
		, hjust = hjust
	)
)

#expand the axis to fit the text
r = r + scale_x_discrete(
	expand=c(2,2)
)

#show the plot
print(r)
#
Thank you. However - my understanding of the parallel coordinates plot
is that you have factors, not time, on the x axis. Also the 'bump
chart' i invision is best suited for only two different x categories.
But technically I guess you are right.

Cheers.
#
On Mon, Apr 27, 2009 at 2:23 AM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
Wauw - thank you. I'm sure I need to understand gplot better. With
qplot I can make something similar - quite easy.

With your reformattet data:

#here's the data provided by Andreas
countries <- c("U-lande", "Afrika syd for sahara", "Europa og
Centralasien", "Lantinamerika og Caribien","Mellem?stenog
Nordafrika","Sydasien","?Stasien og stillehaveet", "Kina",
"Brasilien")
poor_1990 <- c(28.7,46.7,0.5,10.2,2.3,43,29.8,33,14)
poor_2004 <- c(18.1,41.1,0.9,8.6,1.5,30.8,9.1,9.9,7.5)

#reformat the data
data = data.frame(countries,poor_1990,poor_2004)
data = melt(data,id=c('countries'),variable_name='year')
levels(data$year) = c('1990','2004')

#make a new column to make the text justification easier
data$hjust = 1-(as.numeric(data$year)-1)

library(ggplot2)
qplot(year,value, data=data,label=countries, geom=c("line","text"),
group=countries, col=countries)

But I would like to have the text labels show only once - e.g. at 1990
- and also control the size of the text. In my crude qplot, setting
size=2 e.g. changes not only the text, but also the lines etc. I guess
I have to move from qplot to gplot.
#
thank you kindly - will do :-)

Cheers

On Mon, Apr 27, 2009 at 1:21 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
#
Andreas Christoffersen wrote:
Hi Andreas,
Not too hard. Try this:

bump.plot<-function (y1,y2=NULL,top.labels=NULL,left.labels=NULL,
 right.labels=NULL,rank=TRUE,descending=TRUE,linecol=par("fg"),
 mar=c(2,2,4,2),...) {

 if(missing(y1))
  stop("Usage: spread.labels(y1,y2,labels,...)")
 ydim<-dim(y1)
 if(is.null(ydim) && is.null(y2))
  stop("y1 must be a matrix or data frame if y2 is NULL")
 oldmar<-par("mar")
 par(mar=mar)
 if(is.null(y2) && ydim[2] > 1) {
  y2<-y1[,2]
  y1<-y1[,1]
 }
 if(rank) {
  left.labels<-left.labels[order(y1)]
  right.labels<-right.labels[order(y1)]
  y1<-rank(y1)
  y2<-rank(y2)
  if(descending) {
   left.labels<-rev(left.labels)
   right.labels<-rev(right.labels)
   y1<-rev(y1)
   y2<-rev(y2)
  }
 }
 ny<-length(y1)
 plot(c(rep(1,ny),rep(2,ny)),c(rev(y1),rev(y2)),xlim=c(0,3),xlab="",ylab="",
  axes=FALSE,...)
 segments(rep(1,ny),y1,rep(2,ny),y2,col=linecol)
 text(0.8,rev(y1),left.labels,adj=1)
 text(2.2,rev(y2),right.labels,adj=0)
 par(mar=oldmar)
}
# use the above data
bump.plot(poor_1990,poor_2004,c("1990","2004"),countries,countries,
linecol=1:20,pch=16,main="Bumps plot")

Jim
#
Amazing! - a bump.plot function - so cool. I love it when I
simultaneously realize the power of R and my own limitations with R. I
must learn how to write my own functions (suggestions for good
introduction are very welcome)

But: When I run the following
bump.plot(poor_1990,poor_2004,c("1990","2004"),countries,
linecol=1:20, pch=16, main="Bumps plot" ,rank=F)

1: It seams the data is somehow sorted/ordered perversely: Now Kina
(Danish for "China") e.g. has traded places with "Afrika syd for
sahara" (Danish for "Africa south of Sahara"), etc.

2: When I set e.g. right.labels=NULL - I get a lot of empty space. How
could I remove it?

p.s. For aesthetics I've change a tiny part of your function to
plot(c(rep(1,ny),rep(2,ny)),c(rev(y1),rev(y2)),xlim=c(0,3),xlab="",ylab="",
axes=T,xaxt="n",yaxt="s",bty="n",...)
Thereby adding an Y axis. - I don't really know whats going on - but I
could guess this was were the axis magic happened.

Cheers - a very gratefull andreas (especially if someone can help with
the misplaces labels mentioned in 1 above)
#
]
Or just add the text layer separately:

qplot(year, value, data = data, geom = "line", group = countries) +
  geom_text(aes(label = countries), subset = .(year == 1990),
    hjust = 1, size = 3, lineheight = 1)

Hadley
#
THX a lot!
The subset did not work for me, but this does:
subset(data,year == 1990)

Andreas
#
I am sorry - but maybe someone will help me with the final puzzle. How
to remove the legend from the qplot?

I can google my way to something like
sc <- scale_fill_continuous()
sc at legend <- FALSE

but
qplot(year, value, data = data, xlab="?r",ylab="% i ekstrem fattigdom",
 geom = "line", group = Lande,col=Lande) + sc +
 geom_text(aes(label = Lande), subset(data,year == 1990),
   hjust = 0.5,vjust=0, size = 3, lineheight = 1)

doesnot work. Is there no simple way to just: legend=F ?

On Mon, Apr 27, 2009 at 8:53 PM, Andreas Christoffersen
<achristoffersen at gmail.com> wrote: