Skip to content

Convert filogenetic tree to binary matrix

4 messages · vanderlei52, Ben Bolker, Alexander Senger

#
I need to create a binary matrix with all node of a phylogenetic tree and the
presence of each taxo in their respective node.

Example:

require(ape)
y<-read.tree(text="(E,((H,I)D,(F,G)C)B)A;")
y
plot(y, show.node=TRUE)

I need to create a binary matrix as follows:

	A	B	C	D
G	1	1	1	0
F	1	1	1	0
I	1	1	0	1
H	1	1	0	1
E	1	0	0	0

Somebody could help me to solve this problem.

Thanks,


Vanderlei Debastiani

--
View this message in context: http://r.789695.n4.nabble.com/Convert-filogenetic-tree-to-binary-matrix-tp3478961p3478961.html
Sent from the R help mailing list archive at Nabble.com.
#
vanderlei52 <vanderleidebastianimach <at> yahoo.com.br> writes:
I would suggest that you try this question on the r-sig-phylo mailing
list instead.  The phylobase package has an ancestors() functions that
could help you put together a solution, but there may well be a quicker,
easier way.

   Ben Bolker
1 day later
#
Hi Ben,

Thank you for your help.

I did the same question in the r-sig-phylo mailing list. Liam Revell gave
the following solution: 

temp<-prop.part(tree)
X<-matrix(0,nrow=length(tree$tip),ncol=length(temp),dimnames=list(tree$tip.label,tree$node.label))
for(i in 1:ncol(X)) X[temp[[i]],i]<-1

Vanderlei


--
View this message in context: http://r.789695.n4.nabble.com/Convert-filogenetic-tree-to-binary-matrix-tp3478961p3484371.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello expeRts,


here is something which strikes me as kind of odd and I would like to ask for some enlightenment:

First let's do this:

tkern <- kernel("modified.daniell", c(5,5))
test <- rep(1,1000000)
system.time(kernapply(test,tkern))
        User      System verstrichen
       1.100       0.040       1.136

That was easy. Now this:

test <- rep(1,1100000)
system.time(kernapply(test,tkern))
        User      System verstrichen
        1.40        0.02        1.43

Still fine. Now this:

test <- rep(1,1110000)
system.time(kernapply(test,tkern))
        User      System verstrichen
       1.390       0.020       1.409

Ok, by now it seems boring. But wait:

test <- rep(1,1110300)
system.time(kernapply(test,tkern))
        User      System verstrichen
      12.270       0.030      12.319

There is a sudden - and repeatable! - jump in the time needed to execute kernapply. At least from a 
naive point of view there should not be much difference between applying a kernel to a vector 
1110000 or 1110300 entries long. But maybe there is some limit here?

So I tried this:

test <- rep(1,1110400)
system.time(kernapply(test,tkern))
        User      System verstrichen
        1.96        0.01        1.97

which doesn't fit into the pattern. But the best thing is still to come. When I try this

test <- rep(1,1110308)
system.time(kernapply(test,tkern))

then the computer starts to run and does so for longer than 15 minutes until when I normally kill 
the process. As noted above this behaviour is repeatable and occurs every time I issue these commands.

I really would like to know if there is some magic to the number 1110308 I'm not aware of.


Last but not least, here is my

sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu

locale:
  [1] LC_CTYPE=de_DE.utf8       LC_NUMERIC=C
  [3] LC_TIME=de_DE.utf8        LC_COLLATE=de_DE.utf8
  [5] LC_MONETARY=C             LC_MESSAGES=de_DE.utf8
  [7] LC_PAPER=de_DE.utf8       LC_NAME=C
  [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.1


Thank you,

Alex