Skip to content

Splitting a character variable into a numeric one and a character one?

9 messages · Frank Duan, Marc Schwartz (via MN), Barry Rowlingson +1 more

#
strapply in package gsubfn can do that:


library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix

data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame
On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
#
On Mon, 2006-09-25 at 11:04 -0500, Frank Duan wrote:
Something like this using gsub() should work I think:
V1
1 123abc
2 12cd34
3   1e23


# Replace letters and any following chars with ""
DF$V2 <- gsub("[A-Za-Z]+.*", "", DF$V1)


# Replace any initial numbers with ""
DF$V3 <- gsub("^[0-9]+", "", DF$V1)
V1  V2   V3
1 123abc 123  abc
2 12cd34  12 cd34
3   1e23   1  e23

See ?gsub and ?regex for more information.

HTH,

Marc Schwartz
#
On Mon, 2006-09-25 at 11:30 -0500, Marc Schwartz (via MN) wrote:
Quick typo correction here. It should be:

DF$V2 <- gsub("[A-Za-z]+.*", "", DF$V1)

The second 'z' should be lower case.


Marc
#
Here is one more solution:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- gsubfn("^([[:digit:]]+)(.*)", paste, s, backref = -2)
read.table(textConnection(out))

It assumes there are no spaces in the strings.  If
there are then choose a sep= that does not appear
and do this:

sep = ","
f <- function(x, y) paste(x, y, sep = sep)
out <- gsubfn("^([[:digit:]]+)(.*)", f, s, backref = -2)
read.table(textConnection(out), sep = sep)
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
#
And here is a third solution not using package gsubfn:

s <- c("123abc", "12cd34", "1e23")
out <- gsub("^(([[:digit:]]+)(.*))", "\\1 \\2 \\3", s)
read.table(textConnection(out), as.is = TRUE)

Again, if spaces appear in the input string choose a character
not appearing, such as comma, and do it like this:

s <- c("123abc", "12cd34", "1e23")
out <- gsub("^(([[:digit:]]+)(.*))", "\\1,\\2,\\3", s)
read.table(textConnection(out), sep = ",", as.is = TRUE)
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
#
My first thought on this was to apply the regexp "^([0-9]*)(.*)$" and 
getting the two parts out. But I dont see a way to get both matches in 
parentheses out in one go.

In Python you just do:

  >>> re.findall('^([0-9]*)(.*)$',"123abc")
  [('123', 'abc')]

  >>> re.findall('^([0-9]*)(.*)$',"1e12")
  [('1', 'e12')]

In R you can get the groups and go gsub on them:

  > r="^([0-9]*)(.*)$"
  > gsub(r,"\\1","123abc")
  [1] "123"

  But I dont see a way of getting the two values out except as part of 
one string in gsub - which is right back where you started - or doing 
gsub twice.

Barry
#
Here is a slight simplification of the strapply solution using simplify = TRUE

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- t(strapply(s, "^([[:digit:]]+)(.*)", c, simplify = TRUE)) # matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3])
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote: