Skip to content

Summarize by two or more attributes

9 messages · LCOG1, Abhijit Dasgupta, Felipe Carrillo +2 more

#
Okay everyone heres a likely softball for someone.

Consider the following data frame:

#Create data
x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)


I want to create a new column the equals the sum of the Rates for each type
(1,15) by Bin.  

A related question:  I have been using R for a while now and usually
manipulate my data in data frames but i know lists are better for R so
perhaps the above should be done using lists.  Feel free to offer
suggestions coming from that angle.  

Thanks guys

JR-



--
View this message in context: http://r.789695.n4.nabble.com/Summarize-by-two-or-more-attributes-tp3529825p3529825.html
Sent from the R help mailing list archive at Nabble.com.
#
Like This?

x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
Df

ddply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))

?# Adding a column to Df
ddply(Df,c('Type','Bin'),mutate,Summed=sum(Rate))
? 
# Convert the result to a list
dlply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))


?
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx




----- Original Message ----
#
On May 17, 2011, at 11:48 AM, LCOG1 wrote:

            
See ?ave and consider:

# Presuming you want 'Bin' nested within 'Source'
Df$Sum <- ave(Df$Rate, list(Df$Source, Df$Bin), FUN = sum)

# Or 'Source' nested within 'Bin'
Df$Sum <- ave(Df$Rate, list(Df$Bin, Df$Source), FUN = sum)


On your follow up, a data frame is a type of list with a 'data.frame' class attribute, a 'row.names' attribute and a 'names' attribute for the column names. Much like a matrix is a vector with a 'dim' attribute. 

Try this:

  unclass(Df)

and see the output. It looks just like a list, because it is...

If dealing with 'rectangular' datasets (eg. a database table), where each column may need to be of differing data types, a data frame in R is specifically designed to handle it. It is because a data frame is a list, that it can do this, since each element in a list can be a different type.

If you need to deal with a data structure that may not be entirely based upon a rectangular data set and may need to contain various numbers of items per element, then a list is the way to go. Lists are commonly used in R functions to return complex objects that may contain vectors of various types, matrices, data frames and even lists of lists. 

A quick example would be objects returned by R's model functions. Run example(lm) and after the graphs finish, use str(lm.D9) to give an example of the structure of a somewhat complex list object.

HTH,

Marc Schwartz
#
On May 17, 2011, at 12:53 PM, LCOG1 wrote:

            
Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results.

Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'.

The use of ave(), now based upon your above:

  ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum)

would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type.

I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now...  :-)

Regards,

Marc Schwartz
#
Marc, 
  How could I also apply the spline function to each of the 'columns' found in the result   from 

tapply(Df$Rate,list(Df$Bin,Df$Type),sum)

??




-----Original Message-----
From: Marc Schwartz [mailto:marc_schwartz at me.com] 
Sent: Tuesday, May 17, 2011 12:42 PM
To: ROLL Josh F
Cc: r-help at r-project.org
Subject: Re: [R] Summarize by two or more attributes
On May 17, 2011, at 12:53 PM, LCOG1 wrote:

            
Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results.

Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'.

The use of ave(), now based upon your above:

  ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum)

would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type.

I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now...  :-)

Regards,

Marc Schwartz
#
On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:

            
Something along the lines of the following:

  apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline)


If I am understanding what you want to do. 

Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline.

HTH,

Marc
#
I will take a look.  In my real data I need to interpolate the 16 points into 64 points for each of the categories.  

Thanks Marc

JR 

-----Original Message-----
From: Marc Schwartz [mailto:marc_schwartz at me.com] 
Sent: Tuesday, May 17, 2011 1:09 PM
To: ROLL Josh F
Cc: r-help at r-project.org
Subject: Re: [R] Summarize by two or more attributes
On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:

            
Something along the lines of the following:

  apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline)


If I am understanding what you want to do. 

Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline.

HTH,

Marc