An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060925/afe32e31/attachment.pl
Splitting a character variable into a numeric one and a character one?
9 messages · Frank Duan, Marc Schwartz (via MN), Barry Rowlingson +1 more
strapply in package gsubfn can do that:
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame
On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,
I have a data with a variable like this:
Column 1
"123abc"
"12cd34"
"1e23"
...
Now I want to do an operation that can split it into two variables:
Column 1 Column 2 Column 3
"123abc" 123 "abc"
"12cd34" 12 "cd34"
"1e23" 1 "e23"
...
So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.
I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?
Thanks in advance,
FD
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, 2006-09-25 at 11:04 -0500, Frank Duan wrote:
Hi All, I have a data with a variable like this: Column 1 "123abc" "12cd34" "1e23" ... Now I want to do an operation that can split it into two variables: Column 1 Column 2 Column 3 "123abc" 123 "abc" "12cd34" 12 "cd34" "1e23" 1 "e23" ... So basically, I want to split the original variabe into a numeric one and a character one, while the splitting element is the first character in Column 1. I searched the forum with key words "strsplit"and "substr", but still can't solve this problem. Can anyone give me some hints? Thanks in advance, FD
Something like this using gsub() should work I think:
DF
V1
1 123abc
2 12cd34
3 1e23
# Replace letters and any following chars with ""
DF$V2 <- gsub("[A-Za-Z]+.*", "", DF$V1)
# Replace any initial numbers with ""
DF$V3 <- gsub("^[0-9]+", "", DF$V1)
DF
V1 V2 V3 1 123abc 123 abc 2 12cd34 12 cd34 3 1e23 1 e23 See ?gsub and ?regex for more information. HTH, Marc Schwartz
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20060925/878af509/attachment.pl
On Mon, 2006-09-25 at 11:30 -0500, Marc Schwartz (via MN) wrote:
On Mon, 2006-09-25 at 11:04 -0500, Frank Duan wrote:
Hi All, I have a data with a variable like this: Column 1 "123abc" "12cd34" "1e23" ... Now I want to do an operation that can split it into two variables: Column 1 Column 2 Column 3 "123abc" 123 "abc" "12cd34" 12 "cd34" "1e23" 1 "e23" ... So basically, I want to split the original variabe into a numeric one and a character one, while the splitting element is the first character in Column 1. I searched the forum with key words "strsplit"and "substr", but still can't solve this problem. Can anyone give me some hints? Thanks in advance, FD
Something like this using gsub() should work I think:
DF
V1
1 123abc
2 12cd34
3 1e23
# Replace letters and any following chars with ""
DF$V2 <- gsub("[A-Za-Z]+.*", "", DF$V1)
Quick typo correction here. It should be:
DF$V2 <- gsub("[A-Za-z]+.*", "", DF$V1)
The second 'z' should be lower case.
Marc
Here is one more solution:
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- gsubfn("^([[:digit:]]+)(.*)", paste, s, backref = -2)
read.table(textConnection(out))
It assumes there are no spaces in the strings. If
there are then choose a sep= that does not appear
and do this:
sep = ","
f <- function(x, y) paste(x, y, sep = sep)
out <- gsubfn("^([[:digit:]]+)(.*)", f, s, backref = -2)
read.table(textConnection(out), sep = sep)
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
strapply in package gsubfn can do that:
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame
On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,
I have a data with a variable like this:
Column 1
"123abc"
"12cd34"
"1e23"
...
Now I want to do an operation that can split it into two variables:
Column 1 Column 2 Column 3
"123abc" 123 "abc"
"12cd34" 12 "cd34"
"1e23" 1 "e23"
...
So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.
I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?
Thanks in advance,
FD
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
And here is a third solution not using package gsubfn:
s <- c("123abc", "12cd34", "1e23")
out <- gsub("^(([[:digit:]]+)(.*))", "\\1 \\2 \\3", s)
read.table(textConnection(out), as.is = TRUE)
Again, if spaces appear in the input string choose a character
not appearing, such as comma, and do it like this:
s <- c("123abc", "12cd34", "1e23")
out <- gsub("^(([[:digit:]]+)(.*))", "\\1,\\2,\\3", s)
read.table(textConnection(out), sep = ",", as.is = TRUE)
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
Here is one more solution:
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- gsubfn("^([[:digit:]]+)(.*)", paste, s, backref = -2)
read.table(textConnection(out))
It assumes there are no spaces in the strings. If
there are then choose a sep= that does not appear
and do this:
sep = ","
f <- function(x, y) paste(x, y, sep = sep)
out <- gsubfn("^([[:digit:]]+)(.*)", f, s, backref = -2)
read.table(textConnection(out), sep = sep)
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
strapply in package gsubfn can do that:
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame
On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,
I have a data with a variable like this:
Column 1
"123abc"
"12cd34"
"1e23"
...
Now I want to do an operation that can split it into two variables:
Column 1 Column 2 Column 3
"123abc" 123 "abc"
"12cd34" 12 "cd34"
"1e23" 1 "e23"
...
So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.
I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?
Thanks in advance,
FD
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Now I want to do an operation that can split it into two variables: Column 1 Column 2 Column 3 "123abc" 123 "abc" "12cd34" 12 "cd34" "1e23" 1 "e23" ... So basically, I want to split the original variabe into a numeric one and a character one, while the splitting element is the first character in Column
My first thought on this was to apply the regexp "^([0-9]*)(.*)$" and
getting the two parts out. But I dont see a way to get both matches in
parentheses out in one go.
In Python you just do:
>>> re.findall('^([0-9]*)(.*)$',"123abc")
[('123', 'abc')]
>>> re.findall('^([0-9]*)(.*)$',"1e12")
[('1', 'e12')]
In R you can get the groups and go gsub on them:
> r="^([0-9]*)(.*)$"
> gsub(r,"\\1","123abc")
[1] "123"
But I dont see a way of getting the two values out except as part of
one string in gsub - which is right back where you started - or doing
gsub twice.
Barry
Here is a slight simplification of the strapply solution using simplify = TRUE
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- t(strapply(s, "^([[:digit:]]+)(.*)", c, simplify = TRUE)) # matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3])
On 9/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
strapply in package gsubfn can do that:
library(gsubfn)
s <- c("123abc", "12cd34", "1e23")
out <- strapply(s, "^([[:digit:]]+)(.*)", c)
out <- do.call(rbind, out) # as a matrix
data.frame(x = out[,1], num = as.numeric(out[,2]), char = out[,3]) #
as a data.frame
On 9/25/06, Frank Duan <fhduan at gmail.com> wrote:
Hi All,
I have a data with a variable like this:
Column 1
"123abc"
"12cd34"
"1e23"
...
Now I want to do an operation that can split it into two variables:
Column 1 Column 2 Column 3
"123abc" 123 "abc"
"12cd34" 12 "cd34"
"1e23" 1 "e23"
...
So basically, I want to split the original variabe into a numeric one and a
character one, while the splitting element is the first character in Column
1.
I searched the forum with key words "strsplit"and "substr", but still can't
solve this problem. Can anyone give me some hints?
Thanks in advance,
FD
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.