Splitting a character variable into a numeric one and a character one?

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060925/afe32e31/attachment.pl
strapply in package gsubfn can do that:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix

data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame
Hi All,

I have a data with a variable like this:

Column 1

"123abc"
"12cd34"
"1e23"
...

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.

I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?

Thanks in advance,

FD

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hi All,

I have a data with a variable like this:

Column 1

"123abc"
"12cd34"
"1e23"
...

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.

I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?

Thanks in advance,

FD
Something like this using gsub() should work I think:
DF
V1
1 123abc
2 12cd34
3   1e23

# Replace letters and any following chars with ""
DF$V2 <- gsub("[A-Za-Z]+.*", "", DF$V1)

# Replace any initial numbers with ""
DF$V3 <- gsub("^[0-9]+", "", DF$V1)
DF
V1  V2   V3
1 123abc 123  abc
2 12cd34  12 cd34
3   1e23   1  e23

See ?gsub and ?regex for more information.

HTH,

Marc Schwartz
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060925/878af509/attachment.pl
On Mon, 2006-09-25 at 11:04 -0500, Frank Duan wrote:
Hi All,

I have a data with a variable like this:

Column 1

"123abc"
"12cd34"
"1e23"
...

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.

I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?

Thanks in advance,

FD

Something like this using gsub() should work I think:

DF
      V1
1 123abc
2 12cd34
3   1e23

# Replace letters and any following chars with ""
DF$V2 <- gsub("[A-Za-Z]+.*", "", DF$V1)
Quick typo correction here. It should be:

DF$V2 <- gsub("[A-Za-z]+.*", "", DF$V1)

The second 'z' should be lower case.

Marc
Here is one more solution:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- gsubfn("^([[:digit:]]+)(.*)", paste, s, backref = -2)
read.table(textConnection(out))

It assumes there are no spaces in the strings.  If
there are then choose a sep= that does not appear
and do this:

sep = ","
f <- function(x, y) paste(x, y, sep = sep)
out <- gsubfn("^([[:digit:]]+)(.*)", f, s, backref = -2)
read.table(textConnection(out), sep = sep)
strapply in package gsubfn can do that:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix

data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame

On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,

I have a data with a variable like this:

Column 1

"123abc"
"12cd34"
"1e23"
...

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.

I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?

Thanks in advance,

FD

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

And here is a third solution not using package gsubfn:

s <- c("123abc", "12cd34", "1e23")
out <- gsub("^(([[:digit:]]+)(.*))", "\\1 \\2 \\3", s)
read.table(textConnection(out), as.is = TRUE)

Again, if spaces appear in the input string choose a character
not appearing, such as comma, and do it like this:

s <- c("123abc", "12cd34", "1e23")
out <- gsub("^(([[:digit:]]+)(.*))", "\\1,\\2,\\3", s)
read.table(textConnection(out), sep = ",", as.is = TRUE)
Here is one more solution:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- gsubfn("^([[:digit:]]+)(.*)", paste, s, backref = -2)
read.table(textConnection(out))

It assumes there are no spaces in the strings.  If
there are then choose a sep= that does not appear
and do this:

sep = ","
f <- function(x, y) paste(x, y, sep = sep)
out <- gsubfn("^([[:digit:]]+)(.*)", f, s, backref = -2)
read.table(textConnection(out), sep = sep)

On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
strapply in package gsubfn can do that:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix

data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame

On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,

I have a data with a variable like this:

Column 1

"123abc"
"12cd34"
"1e23"
...

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.

I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?

Thanks in advance,

FD

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
My first thought on this was to apply the regexp "^([0-9]*)(.*)$" and 
getting the two parts out. But I dont see a way to get both matches in 
parentheses out in one go.

In Python you just do:

  >>> re.findall('^([0-9]*)(.*)$',"123abc")
  [('123', 'abc')]

  >>> re.findall('^([0-9]*)(.*)$',"1e12")
  [('1', 'e12')]

In R you can get the groups and go gsub on them:

  > r="^([0-9]*)(.*)$"
  > gsub(r,"\\1","123abc")
  [1] "123"

  But I dont see a way of getting the two values out except as part of 
one string in gsub - which is right back where you started - or doing 
gsub twice.

Barry
Here is a slight simplification of the strapply solution using simplify = TRUE

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- t(strapply(s, "^([[:digit:]]+)(.*)", c, simplify = TRUE)) # matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3])
strapply in package gsubfn can do that:

library(gsubfn)
s <- c("123abc", "12cd34", "1e23")

out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix

data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame

On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,

I have a data with a variable like this:

Column 1

"123abc"
"12cd34"
"1e23"
...

Now I want to do an operation that can split it into two variables:

Column 1        Column 2         Column 3

"123abc"         123                  "abc"
"12cd34"         12                    "cd34"
"1e23"             1                      "e23"
...

So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.

I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?

Thanks in advance,

FD

       [[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.