Skip to content

splitting a character field in R

6 messages · ManuelPerera-Chang@fmc-ag.com, Gabor Grothendieck, Dimitris Rizopoulos +2 more

#
Hi Jim,

Thanks for your post, I was aware of strsplit, but really could not find
out how i could use it.

I tried like in your example ...

A<-c(1,2,3)
B<-c("dgabcrt","fgrtabc","sabcuuu")
C<-strsplit(B,"abc")
[[1]]
[1] "dg" "rt"

[[2]]
[1] "fgrt"

[[3]]
[1] "s"   "uuu"

Which looks promissing, but here C is a list with three elements. But how
to create the two vectors I need from here, that is

("dg","fgrt", "s") and ("rt","","uuu")

(or how to get access to the substrings "rt" or "uuu").

Greetings

Manuel



                                                                                                                                 
                      jim holtman                                                                                                
                      <jholtman at gmail.c        To:       "ManuelPerera-Chang at fmc-ag.com" <ManuelPerera-Chang at fmc-ag.com>         
                      om>                      cc:       r-help at stat.math.ethz.ch                                                
                                               Subject:  Re: [R] splitting a character field in R                                
                      28.10.2005 16:00
[[1]]
[1] "df" "xy"
On 10/28/05, ManuelPerera-Chang at fmc-ag.com <ManuelPerera-Chang at fmc-ag.com >
wrote:




      Dear R users,

      I have a dataframe with one character field, and I would like to
      create two
      new fields (columns) in my dataset, by spliting the existing
      character
      field into two using an existing substring.

      ... something that in SAS I could solve e.g. combining substr(which I
      am
      aware exist in R) and "index" for determining the position of the
      pattern
      within the string.
      e.g. if my dataframe is ...
      A     B
      1     dgabcrt
      2     fgrtabc
      3     sabcuuu

      Then by splitting by substring "abc" I would get ...

      A     B           B1    B2
      1     dgabcrt     dg    rt
      2     fgrtabc     fgrt
      3     sabcuuu     s     uuu

      Do you know how to do this basic string(dataframe) manipulation in R

      Saludos,

      Manuel

      ______________________________________________
      R-help at stat.math.ethz.ch mailing list
      https://stat.ethz.ch/mailman/listinfo/r-help
      PLEASE do read the posting guide!
      http://www.R-project.org/posting-guide.html



--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?
#
You could use:

data.frame(First = sub("abc.*", "", B), Second = sub(".*abc", "", B))

or if you want to prevent conversion to factors:

data.frame(First = I(sub("abc.*", "", B)), Second = I(sub(".*abc", "", B)))

On 10/28/05, ManuelPerera-Chang at fmc-ag.com
<ManuelPerera-Chang at fmc-ag.com> wrote:
#
try the following:

A <- c("dgabcrt", "fgrtabc", "sabcuuu")
B <- strsplit(A, "abc")

x1 <- sapply(B, "[", 1); x1[is.na(x1)] <- ""
x2 <- sapply(B, "[", 2); x2[is.na(x2)] <- ""
x1
x2

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: <ManuelPerera-Chang at fmc-ag.com>
To: <jholtman at gmail.com>
Cc: <r-help at stat.math.ethz.ch>
Sent: Friday, October 28, 2005 4:38 PM
Subject: Re: [R] splitting a character field in R
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
#
On 28 Oct 2005, ManuelPerera-Chang at fmc-ag.com wrote:

            
To convert a list to a matrix
[,1]   [,2]  
[1,] "dg"   "rt"  
[2,] "fgrt" "fgrt"
[3,] "s"    "uuu" 

is good trick.  However, it doesn't work as intended here since the elements
of C have difference lengths.  It would be nice to have 'rbind' and 'cbind'
allow different ways of padding the vectors into the same length.

One work around is:
[,1] [,2]   [,3] 
[1,] "dg" "fgrt" "s"  
[2,] "rt" NA     "uuu"

You may convert the NA's to "" if you want.

Michael
#
Here is one additional solution:

read.table(textConnection(sub("abc", " ", B)), fill = TRUE)

It also works if there are more than 2 fields.     If there can
be spaces in the lines then the sub should be modified to
translate "abc" to some unique character not appearing in
the lines and sep= should be added to the read.table call.
Also as.is=TRUE can be added to the read.table call if
its desired to return character rather than factor columns
and col.name= can be added to the read.table call if it
is desired to control the naming of the returned columns.

This solution will also work with more than two fields.
On 10/28/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote: