Skip to content

omit empty cells in crosstab?

8 messages · Dieter Menne, Peter Dalgaard, sjaffe +3 more

#
Perhaps this is a common question but I haven't been able to find the answer.

I have data with many factors, each taking many values. However, only
relatively few combinations appear in the data, ie have nonzero counts, in
other words the resulting table is sparse. Say we have 10 factors each with
10 levels. The result of table() would exceed the memory space (on a 32bit
machine). Is there any way to produce a table with empty cells omitted?
(without first producing the whole table and then removing rows.)

Thanks,
Steve
#
Hi Steve,

The general answer is yes, but the specific will depend on your
problem.  Could you provide a small reproducible example to illustrate
your problem?

Hadley
On Fri, Apr 24, 2009 at 1:19 PM, sjaffe <sjaffe at riskspan.com> wrote:

  
    
#
sjaffe <sjaffe <at> riskspan.com> writes:
It would be easier if you had a reproducible base example, but I 
suggest to create ONE new factor of the pasted levels using unique(), 
and  creating a table of these.

Dieter
#
Dieter Menne wrote:
interaction(..., drop=TRUE) may be a neater way.
#
small example:

a<-c(1.1, 2.1, 9.1)
b<-cut(a,0:10)
c<-data.frame(b,b)
d<-table(c)
dim(d) 
##result: c(10, 10)

But only 9 of the 100 cells are non-zero.
If there were 10 columns, the table have 10 dimensions each of length 10, so
have 10^10 elements, too much even to fit in memory
Dieter Menne wrote:

  
    
#
I think the easiest way to deal with this problem is
to paste together the values, use table on those, and
then unpaste (strsplit) them back.  Using the 10 columns
with 10 levels example:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 Freq
1   10  8  3  3  5  9  9  6  5   2    1
2    1  1  3  6  2  7  7  7  1  10    1
3    2  6  6  8  9  7  2  8  6   3    1
4    2  7  1  2  2  4  3  4  3   2    1
5    3  5  2  3  6  3  4  7  6   7    1
6    4  1  9 10  2  9  6  1  4   1    1
7    4  2  5  8  2  2  1  4  6   3    1
8    4  8  4  8  2  9  2  3  4   1    1
9    5 10  3 10  1  2  1  9  7  10    1
10   7  5  9  6  6  5  2  5  7   2    1
11   7  6  4  4  8  3  8  8 10   6    1
12   9  2  6  2  8  7  5  4  2   1    1

                                        - Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Fri, 24 Apr 2009, sjaffe wrote:

            
#
On Fri, Apr 24, 2009 at 3:12 PM, sjaffe <sjaffe at riskspan.com> wrote:
Here's one way with the plyr package:

library(plyr)
ddply(c, names(c), nrow)

Find more about plyr at http://had.co.nz/plyr

Hadley
1 day later
#
On Fri, 2009-04-24 at 13:12 -0700, sjaffe wrote:
Hi Steve

In your only 3 cells > 0
b.1
b        (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]
  (0,1]      0     0     0     0     0     0     0     0     0      0
  (1,2]      0     1     0     0     0     0     0     0     0      0
  (2,3]      0     0     1     0     0     0     0     0     0      0
  (3,4]      0     0     0     0     0     0     0     0     0      0
  (4,5]      0     0     0     0     0     0     0     0     0      0
  (5,6]      0     0     0     0     0     0     0     0     0      0
  (6,7]      0     0     0     0     0     0     0     0     0      0
  (7,8]      0     0     0     0     0     0     0     0     0      0
  (8,9]      0     0     0     0     0     0     0     0     0      0
  (9,10]     0     0     0     0     0     0     0     0     0      1

If you desire use simple code to find only cell>0 use this

 table(interaction(c,drop=T))

  (1,2].(1,2]   (2,3].(2,3] (9,10].(9,10] 
            1             1             1