Skip to content
Prev 369748 / 398503 Next

count number of stop words in R

I am unfamiliar with the tm package, but using basic regex tools, is
this what you want:

test <- "Mhm . Alright . There's um a young boy that's getting a
cookie jar . And it he's uh in bad shape because uh the thing is
falling over . And in the picture the mother is washing dishes and
doesn't see it . And so is the the water is overflowing in the sink .
And the dishes might get falled over if you don't fell fall over there
there if you don't get it . And it there it's a picture of a kitchen
window . And the curtains are very uh distinct . But the water is
still flowing ."

out <- strsplit(test, " ") ## creates a list whose only component is a
vector of the words

stopw <- c("a","the") ## or whatever they are

sum(grepl(paste(stopw,collapse="|"), out[[1]]))

## If you want to include ".", a regex special character, add:
sum(grepl(".",out[[1]],fixed=TRUE))


If this is all nonsense, just ignore -- and sorry I couldn't help.

-- Bert




Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jun 12, 2017 at 8:23 AM, Elahe chalabi <chalabi.elahe at yahoo.de> wrote: