Discovering patterns in textual strings
The answer is, of course, using regular expressions and/or libraries
therefor. However, I do not think you have defined your problem
sufficiently. Some questions I have:
1. Do possible patterns to be matched always appear at the beginning
of your strings?
2. Always together between specified separators ("_" in your
example); or one of several specified separators; or otherwise?
3. Do spaces or other nonprinting characters occur in your strings?
e.g. would
abc_something
this.is_a long stringwithabcinthemiddle
be considered matching?
There are undoubtedly other possibilities that I've missed.
You may also find it useful to check this "task view" out for possibilities:
https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, May 4, 2018 at 3:25 PM, Jeff Reichman <reichmanj at sbcglobal.net> wrote:
R Help Forum
Is there a R library (or a way) that I can extract unique character strings,
or repeating patterns in textual strings. Say for example I have the
following records:
Abc_1234_kjhksh_276
Abc
Abc_1234_lakdofyo_324
Bce_876_skdhk_*&^%*&
Bce
Bce_454
And I would like to see the following results
Abc
Abc_1234
Bce
Jeff Reichman
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.