Working with string
Happy to help. Your interpretation is correct on the use of "\\1". This returns the value contained in the first back reference in the regex. If you wanted to return multiple back references, these would be "\\2", "\\3" and so on, each referring to successive paren pairs in the regex. Note the double backslash here because of R's treatment of the '\' character, as you may be familiar with. In most regex references, you will see '\1'. For a basic introduction, you can look at ?regex in R to gain some insights into the construction of regular expressions. There are online references such as http://www.regular-expressions.info/ and there is also a good O'Reilly book "Mastering Regular Expressions" (http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124). HTH, Marc
On Jul 7, 2011, at 12:33 PM, Bogaso Christofer wrote:
Thanks Marc for your reply and detailed explanation. As you said, I also agree that, using stringr package I wont get anything really important, however I already have created a long code-book and now I do not want to change anything. However function names are here better meaningful. I have one more query here. Does "\\1" mean that, I want to report the selected string (in place of replacing with something?) What are the other related things? Can you help me giving some online reference? Thanks, -----Original Message----- From: Marc Schwartz [mailto:marc_schwartz at me.com] Sent: 07 July 2011 21:54 To: Bogaso Christofer Cc: r-help at r-project.org Subject: Re: [R] Working with string On Jul 7, 2011, at 11:21 AM, Bogaso Christofer wrote:
Hi there, I have to extract some relevant portion from a defined string, which is a mix of numeric and character. However this has following sequence: Some String - Some numerical - "c/C" (or "p/P") - then again some set of numbers. Examples of such string is "fdahsdfcha163517253c463278643" or "fdahsdfcha163517253C463278643" or "fdahsdfcha163517253P463278643", "fdahsdfcha163517253p463278643" etc. I have tried using latest stringr package to accomplice that. Here is my try:
library(stringr)
str_extract("fdahsdfcha163517253c463278643", "[c]")
[1] "c" But it seems that, above code fetching "c" from "fdahsdfcha" only. My goal is to understand what is there between above 2 set of numbers,
"C/c/P/p"?
Can somebody help me how to do that? I would like to use stringr syntax because, I am already using lot of other functions from that. Therefore if I can do it using that package then it would be good in terms
of consistency.
Thanks for your help.
I don't use 'stringr', but you can get the desired result using ?gsub:
x <- c("fdahsdfcha163517253c463278643", "fdahsdfcha163517253C463278643",
"fdahsdfcha163517253P463278643", "fdahsdfcha163517253p463278643")
gsub(".+[0-9]+([cCpP])[0-9]+", "\\1", x)
[1] "c" "C" "P" "p" The regex in the first argument tells gsub to find a sequence of any characters, followed by a sequence of numbers, followed a by single 'c', 'C', 'p' or 'P', finally followed by a sequence of numbers. Surrounding the [cCpP] in parens allows us to use a 'back reference' and return what is found within the parens using the "\\1" in the second argument.
From a brief review of the stringr manual, it looks like str_extract()
supports the use of a regex for the pattern argument, but does not support the use of back references. It looks like str_replace_all() is a wrapper to gsub(), so you may want to look at that function and the examples for it. Thus, the syntax might be something like: str_replace_all(x, ".+[0-9]+([cCpP])[0-9]+", "\\1") and therefore, I am not sure what you are really saving by using it versus gsub() directly. HTH, Marc Schwartz
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.