Skip to content
Prev 260132 / 398502 Next

Summarize by two or more attributes

On May 17, 2011, at 11:48 AM, LCOG1 wrote:

            
See ?ave and consider:

# Presuming you want 'Bin' nested within 'Source'
Df$Sum <- ave(Df$Rate, list(Df$Source, Df$Bin), FUN = sum)

# Or 'Source' nested within 'Bin'
Df$Sum <- ave(Df$Rate, list(Df$Bin, Df$Source), FUN = sum)


On your follow up, a data frame is a type of list with a 'data.frame' class attribute, a 'row.names' attribute and a 'names' attribute for the column names. Much like a matrix is a vector with a 'dim' attribute. 

Try this:

  unclass(Df)

and see the output. It looks just like a list, because it is...

If dealing with 'rectangular' datasets (eg. a database table), where each column may need to be of differing data types, a data frame in R is specifically designed to handle it. It is because a data frame is a list, that it can do this, since each element in a list can be a different type.

If you need to deal with a data structure that may not be entirely based upon a rectangular data set and may need to contain various numbers of items per element, then a list is the way to go. Lists are commonly used in R functions to return complex objects that may contain vectors of various types, matrices, data frames and even lists of lists. 

A quick example would be objects returned by R's model functions. Run example(lm) and after the graphs finish, use str(lm.D9) to give an example of the structure of a somewhat complex list object.

HTH,

Marc Schwartz