An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121018/a3be789f/attachment.pl>
Help with
4 messages · Rui Esteves, Rui Barradas, arun +1 more
Hello, It's much easier than you think, the first two columns of the input matrix are the row and column numbers into the output matrix, therefore those columns form an index matrix. Just see: x <- scan(text=" 1 1 3 1 2 54 1 3 11 1 4 17 2 1 5 2 4 78 2 5 20 ") mat <- matrix(x, ncol = 3, byrow=TRUE) result <- matrix(0, max(mat[, 1]), max(mat[, 2])) result[ mat[, 1:2] ] <- mat[, 3] Easy, no? Hope this helps, Rui Barradas Em 18-10-2012 13:44, Rui Esteves escreveu:
Hi, I downloaded a dataset from UCI repositories named Bag of Words: http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt The dataset is in a text file with the following structure: --- docID1 wordID1 count docID1 wordID2 count docID1 wordID3 count docID1 wordID4 count ... docID2 wordID2 count docID2 wordID5 count docID2 wordID6 count --- Where docIDx is an integer that identifies the document x; wordIDy is an integer that identifies the word y ; and count is an integer with the number of times that the wordIDy appears in the docIDx. Example: --- 1 1 3 1 2 54 1 3 11 1 4 17 2 1 5 2 4 78 2 5 20 --- I would like to import the file into a matrix (not sparse) where: the wordIDy would correspond to the column [,y] the docIDx would correspond to the row [x,] the value in [x,y] would be the count of wordIDy in the docIDx So, for the previous example it would be like: [,1][,2][,3][,4][,5] [1,] 3 54 11 17 0 [2,] 5 0 0 78 20 I don1t have a clue about how to do this. Can someone please help me? Thank you Rui [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, You can also try this: dat1<-read.table(text=" 1 1 3 1 2 54 1 3 11 1 4 17 2 1 5 2 4 78 2 5 20 ",sep="",header=FALSE) library(reshape2) dat2<-cast(dat1,V1~V2) dat2<-dat2[,-1] dat2[is.na(dat2)]<-0 dat3<-as.matrix(dat2) ?dat3 #???? [,1] [,2] [,3] [,4] [,5] #[1,]??? 3?? 54?? 11?? 17??? 0 #[2,]??? 5??? 0??? 0?? 78?? 20 A.K. ----- Original Message ----- From: Rui Esteves <ruimaximo at gmail.com> To: r-help at r-project.org Cc: Sent: Thursday, October 18, 2012 8:44 AM Subject: [R] Help with Hi, I downloaded a dataset from UCI repositories named Bag of Words: http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt The dataset is in a text file with the following structure: --- docID1 wordID1 count docID1 wordID2 count docID1 wordID3 count docID1 wordID4 count ... docID2 wordID2 count docID2 wordID5 count docID2 wordID6 count --- Where docIDx is an integer that identifies the document x; wordIDy is an integer that identifies the word y ; and count is an integer with the number of times that the wordIDy appears in the docIDx. Example: --- 1 1 3 1 2 54 1 3 11 1 4 17 2 1 5 2 4 78 2 5 20 --- I would like to import the file into a matrix (not sparse) where: the wordIDy would correspond to the column [,y] the docIDx would correspond to the row [x,] the value in [x,y] would be the count of wordIDy in the docIDx So, for the previous example it would be like: ? ? [,1][,2][,3][,4][,5] [1,]? 3? 54? 11 17? 0 [2,]? 5? ? 0? 0 78? 20 I don1t have a clue about how to do this. Can someone please help me? Thank you Rui ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Another option would be to read the data using read.table or similar to get the data into a data frame then use the xtabs function, something like: result <- xtabs( count ~ docID + wordID, data=mydf)
On Thu, Oct 18, 2012 at 6:44 AM, Rui Esteves <ruimaximo at gmail.com> wrote:
Hi, I downloaded a dataset from UCI repositories named Bag of Words: http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt The dataset is in a text file with the following structure: --- docID1 wordID1 count docID1 wordID2 count docID1 wordID3 count docID1 wordID4 count ... docID2 wordID2 count docID2 wordID5 count docID2 wordID6 count --- Where docIDx is an integer that identifies the document x; wordIDy is an integer that identifies the word y ; and count is an integer with the number of times that the wordIDy appears in the docIDx. Example: --- 1 1 3 1 2 54 1 3 11 1 4 17 2 1 5 2 4 78 2 5 20 --- I would like to import the file into a matrix (not sparse) where: the wordIDy would correspond to the column [,y] the docIDx would correspond to the row [x,] the value in [x,y] would be the count of wordIDy in the docIDx So, for the previous example it would be like: [,1][,2][,3][,4][,5] [1,] 3 54 11 17 0 [2,] 5 0 0 78 20 I don1t have a clue about how to do this. Can someone please help me? Thank you Rui [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com