An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20110823/89c9bfdc/attachment.pl>
[Bioc-devel] question on annotations and data.frames
4 messages · Heidi Dvinge, James W. MacDonald, Andreas Heider
Hi Andreas,
Dear mailing list, let's suggest I have a data.frame full of annotation data. In the first column I have Identifiers present in the data. In all other columns I have another annotaion for this identifier column. However, on some rows there is no 1:1 mapping but let's say a 1:3 mapping or something like this: ID Symbol 1 Bla 2 Foo 3 XYZ // xyz /// xyz01 4 abc I want to "stretch" the line which has multiple annotation tags. I know I can split it with "strsplit", this get's me here: ID Symbol 1 Bla 2 Foo 3 "XYZ" "xyz" "xyz01" 4 abc But how can I get to this: ID Symbol 1 Bla 2 Foo 3 XYZ 3 xyz 3 xyz01 4 abc
This isn't particularly elegant, but might get you what you want:
test <- data.frame(A=c("x", "y", "z"), B=c("X", "Y1 // Y2 // Y3", "Z"),
stringsAsFactors=FALSE)
test.split <- strsplit(test$B, " // ", fixed=TRUE)
test2 <- test[rep(1:nrow(test), times=sapply(test.split, length)),]
test2$B <- unlist(test.split)
HTH
\Heidi
Your help will be really appreciated! Thanks in advance, Andreas [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Andreas,
On 8/23/2011 8:12 AM, Andreas Heider wrote:
Dear mailing list, let's suggest I have a data.frame full of annotation data. In the first column I have Identifiers present in the data. In all other columns I have another annotaion for this identifier column. However, on some rows there is no 1:1 mapping but let's say a 1:3 mapping or something like this: ID Symbol 1 Bla 2 Foo 3 XYZ // xyz /// xyz01 4 abc I want to "stretch" the line which has multiple annotation tags. I know I can split it with "strsplit", this get's me here: ID Symbol 1 Bla 2 Foo 3 "XYZ" "xyz" "xyz01" 4 abc But how can I get to this: ID Symbol 1 Bla 2 Foo 3 XYZ 3 xyz 3 xyz01 4 abc Your help will be really appreciated!
Here is one way:
> df <- data.frame(ID = 1:4,
Symbol = I(c("Bla","Foo","XYZ // xyz // xyz01", "abc")))
> lst <- tapply(1:nrow(df), df$ID, function(x) df[x,2])
> lst <- lapply(lst, function(x) strsplit(x, " // "))
> newdf <- data.frame(ID = rep(df[,1], sapply(lst, length)), Symbol =
unlist(lst))
> newdf
ID Symbol
1 1 Bla
2 2 Foo
31 3 XYZ
32 3 xyz
33 3 xyz01
4 4 abc
Best,
Jim
Thanks in advance, Andreas [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20110824/30365710/attachment.pl>