Skip to content

Reading stopwords from a csv file

2 messages · vioravis

#
I am using the tm package to do text miniing:

I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:

stopwordlist <- read.csv("stopwords to be Removed 10042011.csv")
myStopwords <- as.character(stopwordlist$stopwords)

When try removing the stopwords using 

tr1=tm_map(tr1,removeWords,myStopwords)

I am getting the following error:

Error in gsub(sprintf("\\b(%s)\\b", paste(words, collapse = "|")), "",  : 
  internal error in compiling regexp

However, this works fine when I define myStopwords = c(....) instead of
reading from the csv file.

Can someone please help me to resolve this issue?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html
Sent from the R help mailing list archive at Nabble.com.
#
The following for loops does the work but it takes a good 30 minutes to run:

for(i in 1:length(myStopwords))
{
  currentWord <- myStopwords[i]
  tr1=tm_map(tr1,removeWords,currentWord)
}

Are there any faster alternatives?? Thank you.

Ravi



--
View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html
Sent from the R help mailing list archive at Nabble.com.