I am using the tm package to do text miniing:
I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:
stopwordlist <- read.csv("stopwords to be Removed 10042011.csv")
myStopwords <- as.character(stopwordlist$stopwords)
When try removing the stopwords using
tr1=tm_map(tr1,removeWords,myStopwords)
I am getting the following error:
Error in gsub(sprintf("\\b(%s)\\b", paste(words, collapse = "|")), "", :
internal error in compiling regexp
However, this works fine when I define myStopwords = c(....) instead of
reading from the csv file.
Can someone please help me to resolve this issue?
Thank you.
Ravi
--
View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html
Sent from the R help mailing list archive at Nabble.com.
Reading stopwords from a csv file
2 messages · vioravis
The following for loops does the work but it takes a good 30 minutes to run:
for(i in 1:length(myStopwords))
{
currentWord <- myStopwords[i]
tr1=tm_map(tr1,removeWords,currentWord)
}
Are there any faster alternatives?? Thank you.
Ravi
--
View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html
Sent from the R help mailing list archive at Nabble.com.