Tabulating Sparse Contingency Table
Dear 'Born', There was thread on this recently, but I cannot seem to find it. The best suggestion (IMHO) was along these lines: aggregate( rep(1,40), as.data.frame(diag(4)[sample(1:4,40,repl=T),]), sum ) See also http://thread.gmane.org/gmane.comp.lang.r.general/104798/focus=104841 and if you have a really big problem and access to unix utilities you might consider something like this: dat <- read.table( pipe('sort file.dat | uniq -c' ) ) HTH, Chuck p.s. the 'netiquette' of this list is to identify yourself with an appropriate email handle or signature block.
On Fri, 28 Mar 2008, born.to.b.wyld at gmail.com wrote:
I have a sparse contingency table (most cells are 0):
xtabs(~.,data[,idx:(idx+4)])
, , x3 = 1, x4 = 1, x5 = 1 x2 x1 1 2 3 1 0 0 31 2 0 0 112 3 0 0 94 , , x3 = 2, x4 = 1, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 1, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 2, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 2, x4 = 2, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 18 0 3 0 27 0 , , x3 = 3, x4 = 2, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 3, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 2, x4 = 3, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 3, x5 = 1 x2 x1 1 2 3 1 0 0 0 2 1 0 0 3 2 0 0 , , x3 = 1, x4 = 1, x5 = 2 x2 x1 1 2 3 1 0 0 142 2 0 0 340 3 0 0 1 , , x3 = 2, x4 = 1, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 1, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 2, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 2, x4 = 2, x5 = 2 x2 x1 1 2 3 1 0 4 0 2 0 41 0 3 0 0 0 , , x3 = 3, x4 = 2, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 3, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 2, x4 = 3, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 3, x5 = 2 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 1, x5 = 3 x2 x1 1 2 3 1 0 0 173 2 0 0 4 3 0 0 0 , , x3 = 2, x4 = 1, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 1, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 2, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 2, x4 = 2, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 2, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 1, x4 = 3, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 2, x4 = 3, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 , , x3 = 3, x4 = 3, x5 = 3 x2 x1 1 2 3 1 0 0 0 2 0 0 0 3 0 0 0 Now, I do can do the following to get the sparse representation 'y' for the table above:
idx<-2 y<-as.data.frame.table(xtabs(~.,data[,idx:(idx+4)])) y<-y[y$Freq>0,] z<-sort(y$Freq,decreasing=T,index.return=T) y<-y[z$ix,] y
x1 x2 x3 x4 x5 Freq 89 2 3 1 1 2 340 169 1 3 1 1 3 173 88 1 3 1 1 2 142 8 2 3 1 1 1 112 9 3 3 1 1 1 94 122 2 2 2 2 2 41 7 1 3 1 1 1 31 42 3 2 2 2 1 27 41 2 2 2 2 1 18 121 1 2 2 2 2 4 170 2 3 1 1 3 4 75 3 1 3 3 1 2 74 2 1 3 3 1 1 90 3 3 1 1 2 1 I am wondering if there is an R function, or a simple R routine which would help me make the data frame 'y' without using 'xtabs'. I need to study contingency tables of 20 (or even more) dimensions. R is unable to store a full 3^20 contingency table. But since the tables of interest are highly sparse, I figure the problem at hand could be highly simplified if I have something that would create a sparse representation. Any help or suggestions would be greatly appreciated. Thanks, A [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901