Skip to content

Splitting a character vector.

11 messages · Rui Barradas, John Kane, Jeff Newmiller +1 more

#
I am lousy at simple regex and I have not found a solution to a simple problem.

I have a vector with some character values that I want to split.
Sample data
dd1  <-  c( "XXY (mat harry)","XXY (jim bob)", "CAMP (joe blow)", "ALP (max jack)")

Desired result
dd2  <-  data.frame( xx = c("XXY", "XXY", "CAMP", "ALP"), yy = c("mat harry", "jim bob" , "joe blow", "max jack"))

I thought I should be able to split the characters with strsplit but either I am misunderstanding the function or don't know how to escape a "(" properly in an effort to at least get   "XXY" "(mat harry)"

Any pointers would be appreciated
Thanks
John Kane
Kingston ON Canada

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
#
Hello,

Try the following.

open.par <- " \\("  # with a blank before '('
close.par <- "\\)"
result <- strsplit(sub(close.par, "", dd1), open.par)


Why the two '\\'? Because '(' is a meta-character so it must be escaped. 
But '\' is a meta character so it must also be escaped.

Then choose the right way to separate the two, maybe something like

ix <- rep(c(TRUE, FALSE), length(result))
unlist(result)[ix]
unlist(result)[!ix]


Hope this helps,

Rui Barradas

Em 07-07-2012 22:37, John Kane escreveu:
#
Thanks Rui
It works perfectly so far on the test and real data.  

The annoying thing is that I had tried , or thought I'd tried the open.par format and keep getting an error. 

 It looks like I had failed to add the '''',  in the term. What is it doing?



John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
#
Hello,

Sorry, but I don't understand, you're asking about 4 single quotes, the 
double quotes in open.par are just opening and closing the pattern, a 
character string.

Rui Barradas

Em 07-07-2012 23:03, John Kane escreveu:
#
Just to clarify, the regex engine wants to see a \ before the ( if it is to treat it as an ordinary character. However, the source code interpreter also treats \ as an escape character. In order to get a \ into the string, you have to escape it. So it takes two \ characters in source code to obtain one \ character in memory where the regex code can "see" it.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Rui Barradas <ruipbarradas at sapo.pt> wrote:

            
#
No sorry Rui,

In the expression result <- strsplit(sub(close.par, "", dd1), open.par)
there is  close.par, ''", open.par

I probably am just blind but I don't understand what it is doing.



John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
#
Thanks Jeff.
I actually had that figured out after a good hour of pounding my head against the wall but I still could not seem to get the syntax correct.  I think I misunderstand strpsplt() just enough to keep making dumb mistakes.

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
#
It's an empty character string, meant to substitute nothing for 
close.par, to get rid of it.

Rui Barradas

Em 07-07-2012 23:17, John Kane escreveu:
#
How totaly obvious once you tell me!  I would have spend days trying to figure it out.

I think I have a total mental block on regex and their derivatives.

Thanks very much.

John Kane
Kingston ON Canada
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
#
On Jul 7, 2012, at 5:37 PM, John Kane wrote:

            
data.frame(xx=sub("(\\s\\(.+$)", "", dd1),
            yy=sub("(.+)(\\s\\()(.+)(\\)$)", "\\3", dd1) )
     xx        yy
1  XXY mat harry
2  XXY   jim bob
3 CAMP  joe blow
4  ALP  max jack
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
Works perfectly. Thank you very much indeed.

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!