Searching for Enumerated Items using str_count() from the stringr package
On 09/29/2017 12:02 AM, T?th D?nes wrote:
On 09/28/2017 10:25 PM, Dan Abner wrote:
Hi all, I have a large number of text strings to search for enumerated items. However, I am receiving this error message even though I thought that I properly escaped the special character closed parenthesis:
Count<-str_count(text3,keywords)
Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
===
Here is example code:
text1<-"This is a list:
1) Number 1
2) Etc
3) Etc"
text2<-"This is NOT a list:
Blah, blah, blah
Blah, blah, blah"
text3<-c(text1,text2)
text3
{keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""),
paste(0:9,"."),paste(0:9,".",sep=""),"-","*")}
You should carefully read the docs, see ?regexp. You really do not want to pass a multi-element vector as 'keywords' in this case, but instead: stri_count_regex(text3, "[0-9]+\\) ") or: stri_count_regex(text3, "[[:digit:]]+\\) ")
Ah, now I see what you were after: enumerations are not in a standard format, so "1) " can be "1)", "1.", "1 .". In this case: text <- "1)Hello\n2.Hi\n3 .Cheers" keywords <- "[0-9]+(\\)| *?\\.)" stri_count_regex(text, keywords) Note the '|' sign in the keyword definition. It means OR in this context. So literally the regexp expression above can be translated as: A digit or a digit string followed by a parenthesis, or by arbitrary number of spaces (even 0) before a dot. HTH, Denes
BTW, I do not understand why to use the stringr package if it is just a wrapper around the stringi package. Regards, Denes
keywords
Count<-str_count(text3,keywords)
===
I am looking for Count<-c(3,0)
Any suggestions?
Thanks!
Dan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dr. T?th D?nes ?gyvezet? Kogentum Kft. Tel.: 06-30-2583723 Web: www.kogentum.hu