Message-ID: <1451051007.2249683.1429728765369.JavaMail.yahoo@mail.yahoo.com>
Date: 2015-04-22T18:52:45Z
From: Mike
Subject: Why is removeSparseTerms() not doing anything?
Here's the code and results.? The corpus is the text version of a single book.?? (r vs. 3.2)
> docs <- tm_map(docs, stemDocument)
> dtm <- DocumentTermMatrix(docs)
> freq <- colSums(as.matrix(dtm))
> ord <- order(freq)
> freq[tail(ord)]
one experi will can lucid dream
287 312 363 452 1018 2413
> freq[head(ord)]
abbey abdomin abdu abraham absent abus
1 1 1 1 1 1
> dim(dtm)
[1] 1 5265
> dtms <- removeSparseTerms(dtm, 0.1)
> dim(dtms)
[1] 1 5265
> dtms <- removeSparseTerms(dtm, 0.001)
> dim(dtms)
[1] 1 5265
> dtms <- removeSparseTerms(dtm, 0.9)
> dim(dtms)
[1] 1 5265
>
[[alternative HTML version deleted]]