Discovering patterns in textual strings

"Does that help?"

No. I am not your private consultant. You need to reply to the list, which
I have cc'ed here, not just me.

I am still somewhat confused by your specifications, but others may not be.
Part of my confusion stems from your failure to provide a reproducible
example (see e.g. the posting guide linked below).  For example, I cannot
tell from your text whether the Abc and Bce strings contain one or more
spaces at the end. I shall assume they may but need not.

Anyway, here is a reproducible example and solution that assumes that the
substrings/patterns of interest to you occur at the beginning of the
strings and may or may not be followed by one of "." "_" or " "(space) and
then possibly further text which should be ignored. Assuming that you are
familiar with regular expressions, maybe this will help to get you started
even if I have misunderstood your specifications. If you aren't familiar
with regex's, maybe the stringr package may provide a gentler interface
than using R's raw regex functionality. Or maybe someone else can suggest a
better approach (which is another reason why you should reply to the list,
not just me).

z <- c("abc",
       "abc_def",
       "abc.def",
       "abc def",
       "abcd_ef",
       "abcd",
       "e","f")

pats <- unique(sub("^(.+)[. _]+.*", "\\1", z))
## gives:

Discovering patterns in textual strings

Thread (4 messages)