Do you mean when you encounter a new term? I would think document *length* wouldn't matter; presumably you have a list of terms already. If so you could treat each document as a vector of term codes, then use "tabulate" to get the column for that document. If you're using all terms that appear in any document, and you don't want to compile a list of terms first, then you might want to think of creating a sparse representation as in the sparseM package and using the sparse linear algebra routines there. Just an idea, though. Reid Huntsinger -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ryan Steckel Sent: Thursday, March 17, 2005 6:01 PM To: r-help at stat.math.ethz.ch Subject: [R] TD Matrix I'm trying to create a term document matrix where the columns are the documents, the rows are the terms in the documents, and the cells are a weight of term frequency in the document. My problem is the documents are all different lengths. So when I add a new document, if the document length is greater than the max document length in the matrix, I have to resize the matrix and do a cbind operation. Does anyone know of an easier way? ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
TD Matrix
1 message · Huntsinger, Reid