Skip to content

Need help with table() and apply()

4 messages · Stuart Luppescu, jim holtman

#
Hello, I am having trouble getting counts of values in rows of a data
frame. I'm trying to use apply, but it's not working.

This gives a sample of the kind of data I'm working with:

rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)

df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
                          rating.5 , rating.6 , rating.7 , rating.8 ,
                          rating.9 , rating.10))

for(i in 1:10) {
  df[,i] <- factor(df[,i], levels=1:4)
}

[Aside: why does the original df have columns of class "integer" when
the original data are factors? Why is it necessary to reconvert them
into factors? Also, is it possible to do this without a for loop?]

If I do this:

apply(df[,1:10], 1, table)

I get a 4x10 array, the contents of which I do not understand.

apply(df[,1:10], 2, table)

gives 10 tables for the columns, but it leaves out factor levels which
do not occur. For example,

 rating.6 : 'table' int [1:3(1d)] 7 1 2
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:3] "1" "2" "3"

lapply(df[, 1:10], table)

gives tables of the columns keeping the levels with 0 counts:

$ rating.6 : 'table' int [1:4(1d)] 7 1 2 0
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:4] "1" "2" "3" "4"

But I really want tables of the rows. Do I have to write my own function
to count the numbers of values?

Thanks in advance.
#
The answer to your question as to why you had to convert back to
factors is that you "undid" the factors when you did the 'cbind' to
create the dataframe.  Here is what you should have done:
+                          rating.5 , rating.6 , rating.7 , rating.8 ,
+                          rating.9 , rating.10)
'data.frame':   10 obs. of  10 variables:
 $ rating.1 : Factor w/ 4 levels "1","2","3","4": 4 1 2 4 3 2 4 1 2 1
 $ rating.2 : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 2 2 1 3 3 3
 $ rating.3 : Factor w/ 4 levels "1","2","3","4": 3 1 1 3 2 1 3 3 1 3
 $ rating.4 : Factor w/ 4 levels "1","2","3","4": 4 2 2 2 2 4 3 3 3 4
 $ rating.5 : Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 2 3 3 4 4
 $ rating.6 : Factor w/ 4 levels "1","2","3","4": 3 2 2 1 2 2 3 3 3 2
 $ rating.7 : Factor w/ 4 levels "1","2","3","4": 3 4 2 2 4 3 4 4 4 4
 $ rating.8 : Factor w/ 4 levels "1","2","3","4": 4 1 3 1 3 1 4 4 3 3
 $ rating.9 : Factor w/ 4 levels "1","2","3","4": 4 4 2 3 2 4 3 2 3 2
 $ rating.10: Factor w/ 4 levels "1","2","3","4": 1 2 1 3 2 2 3 1 1 1

Notice that the factors are maintained.

When having problems, break up the steps and see what happens at each
one.  Here is the output of your 'cbind':
+                          rating.5 , rating.6 , rating.7 , rating.8 ,
+                          rating.9 , rating.10)
+ )
int [1:10, 1:10] 4 1 2 4 3 2 4 1 2 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:10] "rating.1" "rating.2" "rating.3" "rating.4" ...
notice it is just an integer array.

Also if you had looked at the HELP page, you would have seen:

In the default method, all the vectors/matrices must be atomic (see
vector) or lists. Expressions are not allowed. Language objects (such
as formulae and calls) and pairlists will be coerced to lists: other
objects (such as names and external pointers) will be included as
elements in a list result. Any classes the inputs might have are
discarded (in particular, factors are replaced by their internal
codes).

Notice the last sentence.

2011/11/20 Stuart Luppescu <slu at ccsr.uchicago.edu>:

  
    
#
It might be good if you told us the problem you are trying to solve.
Why do you have factors in the dataframe?  Can you just have the
values?  Do you want to count the 'levels' of the factors in a row, or
do you want to count the numeric they represent (in your case it is
the same, so I wonder why the factor).

Here is one way of doing it to count what the 'level' values are:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    3    2    2    1    2    1    2    2     2
[2,]    1    4    7    3    6    5    0    1    1     2
[3,]    3    1    1    4    2    1    6    5    5     3
[4,]    4    2    0    1    1    2    3    2    2     3
So tell us what you want to do, not how you want to do it.

2011/11/20 jim holtman <jholtman at gmail.com>:

  
    
#
On ?, 2011-11-20 at 17:43 -0500, jim holtman wrote:
I see. The reason I turned the original numeric into factors with 4
levels is so table() would tell me when I had 0 counts of some factor
levels. Your method works very well, and will save me the extra step of
converting to factors. Also, thanks for the explanation on cbind()
converting to numerics. I appreciate the help.