how to separate char and num within a variable
on 02/05/2009 05:20 PM Bill Hyman wrote:
Hi all, I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions!
See ?strsplit Vec <- "chr1:000889594-000889638"
Vec
[1] "chr1:000889594-000889638" # Use a regular expression, defining the 'split' character # as either ":" or "-", where the vertical bar means 'or':
strsplit(Vec, split = ":|-")
[[1]] [1] "chr1" "000889594" "000889638" Note that the split characters are not retained in the result. Let's presume that you have a column in a data frame of the original data and wish to split it into 3 columns: DF <- data.frame(Col = rep(Vec, 10))
DF
Col 1 chr1:000889594-000889638 2 chr1:000889594-000889638 3 chr1:000889594-000889638 4 chr1:000889594-000889638 5 chr1:000889594-000889638 6 chr1:000889594-000889638 7 chr1:000889594-000889638 8 chr1:000889594-000889638 9 chr1:000889594-000889638 10 chr1:000889594-000889638 Note that by default, 'Col' will be a factor and strsplit() expects a character vector, thus we do the coercion and use do.call() to create a character matrix, via rbind(), from the result:
do.call(rbind, strsplit(as.character(DF$Col), split = ":|-"))
[,1] [,2] [,3] [1,] "chr1" "000889594" "000889638" [2,] "chr1" "000889594" "000889638" [3,] "chr1" "000889594" "000889638" [4,] "chr1" "000889594" "000889638" [5,] "chr1" "000889594" "000889638" [6,] "chr1" "000889594" "000889638" [7,] "chr1" "000889594" "000889638" [8,] "chr1" "000889594" "000889638" [9,] "chr1" "000889594" "000889638" [10,] "chr1" "000889594" "000889638" See ?regex, ?do.call and ?rbind for more information. HTH, Marc Schwartz