Skip to content

Searching for Enumerated Items using str_count() from the stringr package

3 messages · Dan Abner, Tóth Dénes

#
Hi all,

I have a large number of text strings to search for enumerated items.
However, I am receiving this error message even though I thought that I
properly escaped the special character closed parenthesis:
Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
  Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)


===

Here is example code:


text1<-"This is a list:
1) Number 1
2) Etc
3) Etc"

text2<-"This is NOT a list:
Blah, blah, blah
Blah, blah, blah"

text3<-c(text1,text2)
text3

{keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""),
paste(0:9,"."),paste(0:9,".",sep=""),"-","*")}

keywords

Count<-str_count(text3,keywords)

===

I am looking for Count<-c(3,0)

Any suggestions?

Thanks!

Dan
#
On 09/28/2017 10:25 PM, Dan Abner wrote:
You should carefully read the docs, see ?regexp.
You really do not want to pass a multi-element vector as 'keywords' in 
this case, but instead:

stri_count_regex(text3, "[0-9]+\\) ")

or:

stri_count_regex(text3, "[[:digit:]]+\\) ")

BTW, I do not understand why to use the stringr package if it is just a 
wrapper around the stringi package.

Regards,
Denes

  
    
#
On 09/29/2017 12:02 AM, T?th D?nes wrote:
Ah, now I see what you were after: enumerations are not in a standard 
format, so "1) " can be "1)", "1.", "1 .".

In this case:
text <- "1)Hello\n2.Hi\n3 .Cheers"
keywords <- "[0-9]+(\\)| *?\\.)"
stri_count_regex(text, keywords)

Note the '|' sign in the keyword definition. It means OR in this 
context. So literally the regexp expression above can be translated as:
A digit or a digit string followed by a parenthesis, or by arbitrary 
number of spaces (even 0) before a dot.

HTH,
Denes