Skip to content

Parsing and counting expressions in .txt-files

3 messages · Alexander Nikles, Bert Gunter

#
Dear Community,



I hope that I have the right category selected because I am relatively new
to the "R" world. I come with a relatively challenging problem in the
luggage.  I would like to realize, that "R" reads text files (there are
several hundred pieces in my folder) sequentially, and screens for specific
terms. If the term is found, the program should write a 1, if not a 0.
Another task is to scrape a ten-digit number from the file after a
particular keyword, so that I can map the results. The Programm should
create an .txt file ideally.



A brief example:



Keywords: "surpassed" "achieved", "very motivated"

Text1:

"Personnel number: 0123456789



The employee has exceeded the set targets and was also otherwise always
motivated (...) "



So I want that my program for this case, ideally reflects the following (in
lines and columns=



Personell number;surpassed;achieved; very motivated (do not write)
0123456789;1;0;1


For the following files, he shall all continue analogously in line 2, 3, 4
and so on.



Could you give a brief assessment, how to realize such a thing? How do I
start best and whether you are possibly "stumbled" in advance about
something similar in R? I am grateful for any suggestions/proposals.



Thank you in advance,



Alex
#
I suggest you go through some R tutorials to learn about R's
capabilities.  Some recommendations can be found here:
https://www.rstudio.com/online-learning/#R

To answer your specific query:

?scan  ## Because you do not specify file format.

?grep  ?regexp ## to use regular expressions to find text.

R may not be the best tool for this task, however. Or certain R
packages may be better than the basic R tools. Try searching on the
rseek.org site to see what might be available if you do not receive
suggestions here.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 20, 2016 at 9:07 AM, Alexander Nikles <24790 at novasbe.pt> wrote:
#
also check out this CRAN task view:

https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 20, 2016 at 9:07 AM, Alexander Nikles <24790 at novasbe.pt> wrote: