Skip to content

Minimum Spanning Tree

10 messages · Gábor Csárdi, jpearl01

#
Hi all, I'm very new to R and read a few tutorials, however I'm having
difficulty trying to figure out how to plot a minimum spanning tree.  I have
a csv file that contains an n-by-n matrix of distances between strains of
bacteria called matrix.csv.

Looks like:
id,strain1, strain2,strain3
strain1,0,.2,.8
strain2,.3,0,.7
strain3,.4,.6,0

I've been messing around with some information I've found on the web that
prints out an mst, however I think it does it with random values, instead of
values provided in a dataset (like from my csv file).  Here's what I tried
looks like:
x <- runif(100)
y <- runif(100)
nearest_neighbour <- function (x, y, d=dist(cbind(x,y)), ...) {
  n <- length(x)
  stopifnot(length(x) == length(y))
  d <- as.matrix(d)
  stopifnot( dim(d)[1] == dim(d)[2] )
  stopifnot( length(x) == dim(d)[1] )
  i <- 1:n 
  j <- apply(d, 2, function (a) order(a)[2])
  segments(x[i], y[i], x[j], y[j], ...)
}
plot(x, y,
     main="Nearest neighbour graph",
     xlab = "", ylab = "")
nearest_neighbour(x, y)

This gets the Nearest Neighbors, and then:

plot(x, y,
     main = "Minimum spanning tree",
     xlab = "", ylab = "")
nearest_neighbour(x, y, lwd=10, col="grey")
points(x,y)
library(ape)
r <- mst(dist(cbind(x, y)))
i <- which(r==1, arr.ind=TRUE )
segments(x[i[,1]], y[i[,1]], x[i[,2]], y[i[,2]],
         lwd = 2, col = "blue")

This prints the mst.  This would be perfect for me!  However I don't know
how to make this use my dataset... Any help (including links to helpful
tutorials!) would be awesome,

Thanks!
~josh
#
Josh, I would recommend to use a package that supports networks, e.g.
igraph, but there are others as well.

You can read in the data using 'read.csv()', transform it to a matrix
with 'as.matrix()', and then create an igraph object from it with
'graph.adjacency()'.

Then call 'minimum.spanning.tree()' to calculate the tree and 'plot()'
to plot it. E.g. for your example file:

library(igraph)
D <- read.csv("/tmp/matrix.csv")
D <- D[,-1]       # we don't need the first column
G <- graph.adjacency(as.matrix(D), weighted=TRUE)

## Some graphical parameters
V(G)$label <- V(G)$name
V(G)$shape <- "rectangle"
V(G)$color <- "white"
V(G)$size <- 40

## MST and plot
mst <- minimum.spanning.tree(G)
lay <- layout.reingold.tilford(G, mode="all")
plot(mst, layout=lay)

Best,
Gabor
On Tue, Apr 7, 2009 at 8:00 PM, jpearl01 <joshearl1 at hotmail.com> wrote:

  
    
#
There was an error in the file... an extraneous comma.  That's taken care of. 
however, my tree prints out an image that doesn't seem like a mst.  Attached
is the csv file I used... 
http://www.nabble.com/file/p22938299/sp_matrix.csv sp_matrix.csv 

I'd like it to look something like the image file also attached...
http://www.nabble.com/file/p22938299/2006-08-27_MST.png 2006-08-27_MST.png 

Is there a different layout that would accomplish that?  Or if not that
exactly, one that would help make the results a little clearer?

Thanks for all the help!
~josh
#
On Tue, Apr 7, 2009 at 11:06 PM, jpearl01 <joshearl1 at hotmail.com> wrote:
Well, it looks definitely a tree to me.
I am not sure what you mean. Of course you can plot it using different
layouts, e.g. with layout.reingold.tilford (after choosing the root
vertex in some way) and then it looks like a usual tree plot, but why
would that be any better?

Unless there is some external information about the graph (e.g.
spatial positions of the nodes, or a distinguished root vertex), the
layout on the image is just as good as the others.

Gabor

  
    
#
I'd like to be able to distinguish between the nodes better.  For example
the image that I get looks like:
http://www.nabble.com/file/p22954099/mst.gif mst.gif 

The tree nodes are just too close together to be able to read.  I think the
problem might be that some distances between the nodes are *much* smaller
than other edges.    The graph does not need to be directed, since the
distance metric does not imply a direction. The most important part is just
being able to see and differentiate between the nodes, and right now they
seem to be all lumped together.  How can I make it more readable?

Thanks,
~josh
#
On Wed, Apr 8, 2009 at 6:12 PM, jpearl01 <joshearl1 at hotmail.com> wrote:
Make the graph undirected first and then choose the right plotting
parameters. E.g. the following works fine for me:

set.seed(2)
g <- erdos.renyi.game(100, 300, type="gnm", directed=TRUE)
E(g)$weight <- runif(ecount(g))
mst <- minimum.spanning.tree(g)

mst <- simplify(as.undirected(mst))
lay <- layout.reingold.tilford(mst, root=which.max(degree(mst))-1)
plot(mst, layout=lay, vertex.size=5, asp=FALSE, vertex.color=NA,
       vertex.frame.color=NA)

G.

  
    
#
Make the graph undirected first and then choose the right plotting
parameters. E.g. the following works fine for me:

set.seed(2)
g <- erdos.renyi.game(100, 300, type="gnm", directed=TRUE)
E(g)$weight <- runif(ecount(g))
mst <- minimum.spanning.tree(g)

mst <- simplify(as.undirected(mst))
lay <- layout.reingold.tilford(mst, root=which.max(degree(mst))-1)
plot(mst, layout=lay, vertex.size=5, asp=FALSE, vertex.color=NA,
       vertex.frame.color=NA)

G.


Thanks for all your help Gabor,  However, I'm unable to get a display in R
which is at all readable.  I'm not sure why that is (if the data is
generated randomly, like in your example the tree builds/reads fine...
however on my dataset whichever node is picked as the root just completely
over powers the rest of the nodes which all clump together in a manner that
is unreadable).  My dataset is:
http://www.nabble.com/file/p22957493/sp_matrix.csv sp_matrix.csv 
If you would like to take a look.  However, as it stands I think I'll have
to find another option.  Thanks again for all your efforts.

~josh
#
Actually,

library(igraph)

tab <- read.csv("http://www.nabble.com/file/p22957493/sp_matrix.csv")
tab <- tab[,-1]

g <- graph.adjacency(as.matrix(tab), weighted=TRUE)
V(g)$label <- V(g)$name

mst <- as.undirected(minimum.spanning.tree(g))

lay <- layout.reingold.tilford(mst, root=which.max(degree(mst))-1)
lay <- cbind(lay[,2], lay[,1])    # rotate
x11(width=15, height=8)
plot(mst, layout=lay, vertex.size=25, vertex.size2=10,
     vertex.shape="rectangle", asp=FALSE,
     vertex.label.cex=0.7, vertex.color="white")

works relatively well for me on your graph. It doesn't for you?

Best,
Gabor
On Wed, Apr 8, 2009 at 9:14 PM, jpearl01 <joshearl1 at hotmail.com> wrote:

  
    
#
That's like a miracle!  The only thing that would make this graph perfect is
if the lengths of the edges were in the same ratio as the actual edge
lengths from the matrix.  Is it possible to alter that?

Thank you!!
~josh



Actually,

library(igraph)

tab <- read.csv("http://www.nabble.com/file/p22957493/sp_matrix.csv")
tab <- tab[,-1]

g <- graph.adjacency(as.matrix(tab), weighted=TRUE)
V(g)$label <- V(g)$name

mst <- as.undirected(minimum.spanning.tree(g))

lay <- layout.reingold.tilford(mst, root=which.max(degree(mst))-1)
lay <- cbind(lay[,2], lay[,1])    # rotate
x11(width=15, height=8)
plot(mst, layout=lay, vertex.size=25, vertex.size2=10,
     vertex.shape="rectangle", asp=FALSE,
     vertex.label.cex=0.7, vertex.color="white")

works relatively well for me on your graph. It doesn't for you?

Best,
Gabor
#
On Wed, Apr 8, 2009 at 10:20 PM, jpearl01 <joshearl1 at hotmail.com> wrote:
Not really. The thing is that the nodes are ordered into layers based
on the distance (in steps) from the root. This is how the
Reingold-Tilford layouting algorithm works. If you start changing the
lengths of the edges, then the nodes could move on top of each other.

You would need to modify the algorithm itself for this, I don't know
how difficult that would be.

What you can do is modifying the width of the edges, I know that is
not as good, but it is still something:

library(igraph)

tab <- read.csv("http://www.nabble.com/file/p22957493/sp_matrix.csv")
tab <- tab[,-1]

g <- graph.adjacency(as.matrix(tab), weighted=TRUE)
V(g)$label <- V(g)$name

mst <- minimum.spanning.tree(g)

rescale <- function(x,from,to) {    # linearly rescale a vector
  r <- range(x)
  (x-r[1]) / (r[2]-r[1]) * (to-from) + from
}

E(mst)$width <- rescale(E(mst)$weight,1,5)   # width between 1 and 5

lay <- layout.reingold.tilford(mst, root=which.max(degree(mst))-1)
lay <- cbind(lay[,2], lay[,1])    # rotate
x11(width=15, height=8)
plot(mst, layout=lay, vertex.size=25, vertex.size2=10,
     vertex.shape="crectangle", asp=FALSE,
     vertex.label.cex=0.7, vertex.color="white", edge.arrow.mode=0)

G.
[...]