Skip to content

Summarizing factor data in table?

4 messages · Andy Bunn, Tony Plate, Gabor Grothendieck

#
I have a very simple query with regard to summarizing the number of factors
present in a certain snippet of a data frame.
Given the following data frame:

	foo <- data.frame(yr = c(rep(1998,4), rep(1999,4), rep(2000,2)), div =
factor(c(rep(NA,4),"A","B","C","D","A","C")),
      	            org = factor(c(1:4,1:4,1,2)))

I want to get two new variables. Object ndiv would give the number of
divisions by year:
     1998 0
     1999 3
     2000 2
Object norgs would give the number of organizations
     1998 4
     1999 4
     2000 2
I figure xtabs should be able to do it, but I'm stuck without a for loop.
Any suggestions? -Andy
#
Do you want to count the number of non-NA divisions and organizations in 
the data for each year (where duplicates are counted as many times as 
they appear)?

 > tapply(!is.na(foo$div), foo$yr, sum)
1998 1999 2000
    0    4    2
 > tapply(!is.na(foo$org), foo$yr, sum)
1998 1999 2000
    4    4    2
 >

Or perhaps the number of unique non-NA divisions and organizations in 
the data for each year?

 > tapply(foo$div, foo$yr, function(x) length(na.omit(unique(x))))
1998 1999 2000
    0    4    2
 > tapply(foo$org, foo$yr, function(x) length(na.omit(unique(x))))
1998 1999 2000
    4    4    2
 >

(I don't understand where the "3" in your desired output comes from 
though, which maybe indicates I completely misunderstand your request.)
Andy Bunn wrote:
#
The three was a typo, which I regret very much. I don't know why I didn't
think of apply. I was obsessed with doing it as a table.
Thanks for your response,
-Andy