split a character variable into several character variable by a character
Good observation, Bill! Adrian
On Friday 10 April 2009, William Dunlap wrote:
strsplit() is the way to do it, but if your putative character strings come from a data.frame you need to make sure they are really character strings and not factors (at least in R 2.8.1).
> d<-data.frame(name=c("Bill Dunlap", "First Last"), num=1:2)
> d
name num
1 Bill Dunlap 1
2 First Last 2
> sapply(d,class)
name num
"factor" "integer"
> strsplit(d$name, " ")
Error in strsplit(d$name, " ") : non-character argument
> strsplit(as.character(d$name), " ")
[[1]] [1] "Bill" "Dunlap" [[2]] [1] "First" "Last"
> d1<-data.frame(stringsAsFactors=FALSE,name=c("Bill Dunlap", "First
Last"), num=1:2)
> sapply(d1,class)
name num
"character" "integer"
> strsplit(d1$name, " ")
[[1]] [1] "Bill" "Dunlap" [[2]] [1] "First" "Last" Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ------------------------------------------------------------------------ - [R] split a character variable into several character variable by a character Adrian Dusa dusa.adrian at gmail.com Fri Apr 10 15:48:53 CEST 2009 Dear Mao Jianfeng, "r-help-owner" is not the place for help, but: r-help at r-project.org (CC-ed here) In any case, strsplit() does the job, i.e.:
unlist(strsplit("BCPy01-01", "-"))
[1] "BCPy01" "01" You can work with the whole variable, like: splitpop <- strsplit(df1$popcode, "-") then access the first part with
unlist(lapply(splitpop, "[", 1))
[1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" [9] "BCPy01" "BCPy01" and the second part with
unlist(lapply(splitpop, "[", 2))
[1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03" hth, Adrian On Friday 10 April 2009, Mao Jianfeng wrote:
Dear, R-lister,
I have a dataframe like the followed. And, I want to split a character
variable ("popcode", or "codetot") into several new variables. For
example,
split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to know
how
to do that. I have tried strsplit() and substring() functions. But, I
still
can not perform the spliting.
It always helps to see exactly what you tried and a description of how the results differ from what you wanted to get.
Any advice? Thanks in advance. df1: popcode codetot p3need BCPy01-01 BCPy01-01-1 100.0000 BCPy01-01 BCPy01-01-2 100.0000 BCPy01-01 BCPy01-01-3 100.0000 BCPy01-02 BCPy01-02-1 92.5926 BCPy01-02 BCPy01-02-1 100.0000 BCPy01-02 BCPy01-02-2 92.5926 BCPy01-02 BCPy01-02-2 100.0000 BCPy01-02 BCPy01-02-3 92.5926 BCPy01-02 BCPy01-02-3 100.0000 BCPy01-03 BCPy01-03-1 100.0000 Regards, Mao Jian-feng
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391