[Bioc-devel] question on annotations and data.frames

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20110823/89c9bfdc/attachment.pl>
Hi Andreas,
Dear mailing list,
let's suggest I have a data.frame full of annotation data. In the first
column I have Identifiers present in the data. In all other columns I have
another annotaion for this identifier column.
However, on some rows  there is no 1:1 mapping but let's say a 1:3 mapping
or something like this:

ID     Symbol
1     Bla
2     Foo
3     XYZ // xyz /// xyz01
4     abc

I want to "stretch" the line which has multiple annotation tags. I know I
can split it with "strsplit", this get's me here:

ID     Symbol
1     Bla
2     Foo
3     "XYZ" "xyz" "xyz01"
4     abc

But how can I get to this:

ID     Symbol
1     Bla
2     Foo
3     XYZ
3     xyz
3     xyz01
4     abc

This isn't particularly elegant, but might get you what you want:

test <- data.frame(A=c("x", "y", "z"), B=c("X", "Y1 // Y2 // Y3", "Z"),
stringsAsFactors=FALSE)

test.split <- strsplit(test$B, " // ", fixed=TRUE)

test2 <- test[rep(1:nrow(test), times=sapply(test.split, length)),]

test2$B	<- unlist(test.split)

HTH
\Heidi
Your help will be really appreciated!

Thanks in advance, Andreas

	[[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hi Andreas,
Dear mailing list,
let's suggest I have a data.frame full of annotation data. In the first
column I have Identifiers present in the data. In all other columns I have
another annotaion for this identifier column.
However, on some rows  there is no 1:1 mapping but let's say a 1:3 mapping
or something like this:

ID     Symbol
1     Bla
2     Foo
3     XYZ // xyz /// xyz01
4     abc

I want to "stretch" the line which has multiple annotation tags. I know I
can split it with "strsplit", this get's me here:

ID     Symbol
1     Bla
2     Foo
3     "XYZ" "xyz" "xyz01"
4     abc

But how can I get to this:

ID     Symbol
1     Bla
2     Foo
3     XYZ
3     xyz
3     xyz01
4     abc

Your help will be really appreciated!
Here is one way:

 > df <- data.frame(ID = 1:4,
Symbol = I(c("Bla","Foo","XYZ // xyz // xyz01", "abc")))

 > lst <- tapply(1:nrow(df), df$ID, function(x) df[x,2])
 > lst <- lapply(lst, function(x) strsplit(x, " // "))
 > newdf <- data.frame(ID = rep(df[,1], sapply(lst, length)), Symbol = 
unlist(lst))
 > newdf
    ID Symbol
1   1    Bla
2   2    Foo
31  3    XYZ
32  3    xyz
33  3  xyz01
4   4    abc

Best,

Jim
Thanks in advance, Andreas

	[[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20110824/30365710/attachment.pl>