Skip to content

how to separate char and num within a variable

5 messages · Bill Hyman, Wacek Kusnierczyk, Marc Schwartz

#
Hi all,

I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions!

Bill
#
Bill Hyman wrote:
if strings is your vector of strings, this should do (assuming the
format is stable across all entries):

strsplit(strings, split=':|-')

vQ
#
on 02/05/2009 05:20 PM Bill Hyman wrote:
See ?strsplit

Vec <- "chr1:000889594-000889638"
[1] "chr1:000889594-000889638"

# Use a regular expression, defining the 'split' character
# as either ":" or "-", where the vertical bar means 'or':
[[1]]
[1] "chr1"      "000889594" "000889638"


Note that the split characters are not retained in the result.

Let's presume that you have a column in a data frame of the original
data and wish to split it into 3 columns:

DF <- data.frame(Col = rep(Vec, 10))
Col
1  chr1:000889594-000889638
2  chr1:000889594-000889638
3  chr1:000889594-000889638
4  chr1:000889594-000889638
5  chr1:000889594-000889638
6  chr1:000889594-000889638
7  chr1:000889594-000889638
8  chr1:000889594-000889638
9  chr1:000889594-000889638
10 chr1:000889594-000889638

Note that by default, 'Col' will be a factor and strsplit() expects a
character vector, thus we do the coercion and use do.call() to create a
character matrix, via rbind(), from the result:
[,1]   [,2]        [,3]
 [1,] "chr1" "000889594" "000889638"
 [2,] "chr1" "000889594" "000889638"
 [3,] "chr1" "000889594" "000889638"
 [4,] "chr1" "000889594" "000889638"
 [5,] "chr1" "000889594" "000889638"
 [6,] "chr1" "000889594" "000889638"
 [7,] "chr1" "000889594" "000889638"
 [8,] "chr1" "000889594" "000889638"
 [9,] "chr1" "000889594" "000889638"
[10,] "chr1" "000889594" "000889638"


See ?regex, ?do.call and ?rbind for more information.

HTH,

Marc Schwartz
#
Thx a lot! It works



----- Original Message ----
From: Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
To: Bill Hyman <billhyman1 at yahoo.com>
Cc: R help <R-help at stat.math.ethz.ch>
Sent: Thursday, February 5, 2009 3:38:34 PM
Subject: Re: [R] how to separate char and num within a variable
Bill Hyman wrote:
if strings is your vector of strings, this should do (assuming the
format is stable across all entries):

strsplit(strings, split=':|-')

vQ
#
Thx a lot!



----- Original Message ----
From: Marc Schwartz <marc_schwartz at comcast.net>
To: Bill Hyman <billhyman1 at yahoo.com>
Cc: r-help at r-project.org
Sent: Thursday, February 5, 2009 3:39:53 PM
Subject: Re: [R] how to separate char and num within a variable
on 02/05/2009 05:20 PM Bill Hyman wrote:
See ?strsplit

Vec <- "chr1:000889594-000889638"
[1] "chr1:000889594-000889638"

# Use a regular expression, defining the 'split' character
# as either ":" or "-", where the vertical bar means 'or':
[[1]]
[1] "chr1"      "000889594" "000889638"


Note that the split characters are not retained in the result.

Let's presume that you have a column in a data frame of the original
data and wish to split it into 3 columns:

DF <- data.frame(Col = rep(Vec, 10))
Col
1  chr1:000889594-000889638
2  chr1:000889594-000889638
3  chr1:000889594-000889638
4  chr1:000889594-000889638
5  chr1:000889594-000889638
6  chr1:000889594-000889638
7  chr1:000889594-000889638
8  chr1:000889594-000889638
9  chr1:000889594-000889638
10 chr1:000889594-000889638

Note that by default, 'Col' will be a factor and strsplit() expects a
character vector, thus we do the coercion and use do.call() to create a
character matrix, via rbind(), from the result:
[,1]   [,2]        [,3]
[1,] "chr1" "000889594" "000889638"
[2,] "chr1" "000889594" "000889638"
[3,] "chr1" "000889594" "000889638"
[4,] "chr1" "000889594" "000889638"
[5,] "chr1" "000889594" "000889638"
[6,] "chr1" "000889594" "000889638"
[7,] "chr1" "000889594" "000889638"
[8,] "chr1" "000889594" "000889638"
[9,] "chr1" "000889594" "000889638"
[10,] "chr1" "000889594" "000889638"


See ?regex, ?do.call and ?rbind for more information.

HTH,

Marc Schwartz