Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON JAPAN TIME NOW. NO SLEEP!!!SAKURA at Niigata, Japan http://ff.im/-29ufG19:30 [BS Japan] ????????? #50 ????????????????????RT@ kvsrinath Japan's New Flat Screens: The Eco-Friendly TV . http://is.gd/sIS7 #greenMold99 says: Introduction to Chiropractic and manual therapeutics when unfit.Choice of schools in Japan, and mo... http://i.sitesays.com/lc7Japan Said to Sell 17 Trillion Yen of Extra Bonds - Bloomberg Actually there were no new lines in the original file but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran the TermDocMatrix function. It is supposed to take a file and more or less count the occurrences of each word in the file. Or as the documentation says "Constructs a term-document matrix"
tdm<-TermDocMatrix(ovid) Data(tdm)[1:2, 105:107]
2 x 3 sparse Matrix of class "dgCMatrix" revealed said sakura 1 . . . 2 15 15 15
Data(tdm)[1:21, 100:105]
Error in intI(i, n = di[1], dn = dn[[1]]) : index larger than maximal 3 I don't understand why I am getting only two lines. I can see that the first line is for the short.txt file and the second line seems to be for the whole myTextFile.txt file. How can I get TermDocMatrix to output each row of myTextFile.txt as a separate row? Thanks very much.
View this message in context: http://www.nabble.com/question-about-the-Text-Mining-package-tm-tp23091573p23091573.html Sent from the R help mailing list archive at Nabble.com.