Okay everyone heres a likely softball for someone.
Consider the following data frame:
#Create data
x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
I want to create a new column the equals the sum of the Rates for each type
(1,15) by Bin.
A related question: I have been using R for a while now and usually
manipulate my data in data frames but i know lists are better for R so
perhaps the above should be done using lists. Feel free to offer
suggestions coming from that angle.
Thanks guys
JR-
--
View this message in context: http://r.789695.n4.nabble.com/Summarize-by-two-or-more-attributes-tp3529825p3529825.html
Sent from the R help mailing list archive at Nabble.com.
Summarize by two or more attributes
9 messages · LCOG1, Abhijit Dasgupta, Felipe Carrillo +2 more
I will hit my own ball on this one tapply(Df$Rate,list(Df$Bin,Df$Type),sum) -- View this message in context: http://r.789695.n4.nabble.com/Summarize-by-two-or-more-attributes-tp3529825p3530034.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110517/3039b826/attachment.pl>
Like This?
x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
Df
ddply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))
?# Adding a column to Df
ddply(Df,c('Type','Bin'),mutate,Summed=sum(Rate))
?
# Convert the result to a list
dlply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))
?
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx
----- Original Message ----
From: LCOG1 <jroll at lcog.org>
To: r-help at r-project.org
Sent: Tue, May 17, 2011 9:48:36 AM
Subject: [R] Summarize by two or more attributes
Okay everyone heres a likely softball for someone.
Consider the following data frame:
#Create data
x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
I want to create a new column the equals the sum of the Rates for each type
(1,15) by Bin.?
A related question:? I have been using R for a while now and usually
manipulate my data in data frames but i know lists are better for R so
perhaps the above should be done using lists.? Feel free to offer
suggestions coming from that angle.?
Thanks guys
JR-
--
View this message in context:
http://r.789695.n4.nabble.com/Summarize-by-two-or-more-attributes-tp3529825p3529825.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On May 17, 2011, at 11:48 AM, LCOG1 wrote:
Okay everyone heres a likely softball for someone.
Consider the following data frame:
#Create data
x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
I want to create a new column the equals the sum of the Rates for each type
(1,15) by Bin.
A related question: I have been using R for a while now and usually
manipulate my data in data frames but i know lists are better for R so
perhaps the above should be done using lists. Feel free to offer
suggestions coming from that angle.
Thanks guys
JR-
See ?ave and consider: # Presuming you want 'Bin' nested within 'Source' Df$Sum <- ave(Df$Rate, list(Df$Source, Df$Bin), FUN = sum) # Or 'Source' nested within 'Bin' Df$Sum <- ave(Df$Rate, list(Df$Bin, Df$Source), FUN = sum) On your follow up, a data frame is a type of list with a 'data.frame' class attribute, a 'row.names' attribute and a 'names' attribute for the column names. Much like a matrix is a vector with a 'dim' attribute. Try this: unclass(Df) and see the output. It looks just like a list, because it is... If dealing with 'rectangular' datasets (eg. a database table), where each column may need to be of differing data types, a data frame in R is specifically designed to handle it. It is because a data frame is a list, that it can do this, since each element in a list can be a different type. If you need to deal with a data structure that may not be entirely based upon a rectangular data set and may need to contain various numbers of items per element, then a list is the way to go. Lists are commonly used in R functions to return complex objects that may contain vectors of various types, matrices, data frames and even lists of lists. A quick example would be objects returned by R's model functions. Run example(lm) and after the graphs finish, use str(lm.D9) to give an example of the structure of a somewhat complex list object. HTH, Marc Schwartz
On May 17, 2011, at 12:53 PM, LCOG1 wrote:
I will hit my own ball on this one tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results. Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'. The use of ave(), now based upon your above: ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum) would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type. I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now... :-) Regards, Marc Schwartz
Marc, How could I also apply the spline function to each of the 'columns' found in the result from tapply(Df$Rate,list(Df$Bin,Df$Type),sum) ?? -----Original Message----- From: Marc Schwartz [mailto:marc_schwartz at me.com] Sent: Tuesday, May 17, 2011 12:42 PM To: ROLL Josh F Cc: r-help at r-project.org Subject: Re: [R] Summarize by two or more attributes
On May 17, 2011, at 12:53 PM, LCOG1 wrote:
I will hit my own ball on this one tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results. Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'. The use of ave(), now based upon your above: ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum) would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type. I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now... :-) Regards, Marc Schwartz
On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:
Marc, How could I also apply the spline function to each of the 'columns' found in the result from tapply(Df$Rate,list(Df$Bin,Df$Type),sum) ??
Something along the lines of the following: apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline) If I am understanding what you want to do. Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline. HTH, Marc
I will take a look. In my real data I need to interpolate the 16 points into 64 points for each of the categories. Thanks Marc JR -----Original Message----- From: Marc Schwartz [mailto:marc_schwartz at me.com] Sent: Tuesday, May 17, 2011 1:09 PM To: ROLL Josh F Cc: r-help at r-project.org Subject: Re: [R] Summarize by two or more attributes
On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:
Marc, How could I also apply the spline function to each of the 'columns' found in the result from tapply(Df$Rate,list(Df$Bin,Df$Type),sum) ??
Something along the lines of the following: apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline) If I am understanding what you want to do. Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline. HTH, Marc