I am having trouble doing the following. I have a data.frame like this, where x and y are a variable that I want to do calculations on: Name Year x y ab 2001 15 3 ab 2001 10 2 ab 2002 12 8 ab 2003 7 10 dv 2002 10 15 dv 2002 3 2 dv 2003 1 15 Before I do all the other things I need to do with this data, I need to summarize or collapse the data by name and year. I've found that I can do things like nameyear<-interaction(name,year) dataframe$nameyear<-nameyear tapply(dataframe$x,dataframe$nameyear,sum) tapply(dataframe$y,dataframe$nameyear,sum) and then bind those together. But my problem is that I need to somehow retain the original Names in my collapsed dataset, so that later I can do analyses with the Name factors. All I can think of is something like tapply(dataframe$Name,dataframe$nameyear, somefunction?) but nothing seems to work. I'm actually trying to convert a SAS program, and I can't get out of that mindset. There, it's a simple Proc Means, By Name Year. Thanks for any help or suggestions on the right way to go about this. Matt Crawford
Summarize by two-column factor, retaining original factors
5 messages · Matt Crawford, Marc Schwartz (via MN), Gabor Grothendieck +2 more
On Fri, 2006-02-24 at 08:18 -0800, Matt Crawford wrote:
I am having trouble doing the following. I have a data.frame like this, where x and y are a variable that I want to do calculations on: Name Year x y ab 2001 15 3 ab 2001 10 2 ab 2002 12 8 ab 2003 7 10 dv 2002 10 15 dv 2002 3 2 dv 2003 1 15 Before I do all the other things I need to do with this data, I need to summarize or collapse the data by name and year. I've found that I can do things like nameyear<-interaction(name,year) dataframe$nameyear<-nameyear tapply(dataframe$x,dataframe$nameyear,sum) tapply(dataframe$y,dataframe$nameyear,sum) and then bind those together. But my problem is that I need to somehow retain the original Names in my collapsed dataset, so that later I can do analyses with the Name factors. All I can think of is something like tapply(dataframe$Name,dataframe$nameyear, somefunction?) but nothing seems to work. I'm actually trying to convert a SAS program, and I can't get out of that mindset. There, it's a simple Proc Means, By Name Year. Thanks for any help or suggestions on the right way to go about this. Matt Crawford
Matt, Just use aggregate():
aggregate(MyDF[, 3:4], list(Name = MyDF$Name, Year = MyDF$Year), sum)
Name Year x y 1 ab 2001 25 5 2 ab 2002 12 8 3 dv 2002 13 17 4 ab 2003 7 10 5 dv 2003 1 15 See ?aggregate for more information. HTH, Marc Schwartz
Or even aggregate(DF[3:4], DF[1:2], sum)
On 2/24/06, Marc Schwartz (via MN) <mschwartz at mn.rr.com> wrote:
On Fri, 2006-02-24 at 08:18 -0800, Matt Crawford wrote:
I am having trouble doing the following. I have a data.frame like this, where x and y are a variable that I want to do calculations on: Name Year x y ab 2001 15 3 ab 2001 10 2 ab 2002 12 8 ab 2003 7 10 dv 2002 10 15 dv 2002 3 2 dv 2003 1 15 Before I do all the other things I need to do with this data, I need to summarize or collapse the data by name and year. I've found that I can do things like nameyear<-interaction(name,year) dataframe$nameyear<-nameyear tapply(dataframe$x,dataframe$nameyear,sum) tapply(dataframe$y,dataframe$nameyear,sum) and then bind those together. But my problem is that I need to somehow retain the original Names in my collapsed dataset, so that later I can do analyses with the Name factors. All I can think of is something like tapply(dataframe$Name,dataframe$nameyear, somefunction?) but nothing seems to work. I'm actually trying to convert a SAS program, and I can't get out of that mindset. There, it's a simple Proc Means, By Name Year. Thanks for any help or suggestions on the right way to go about this. Matt Crawford
Matt, Just use aggregate():
aggregate(MyDF[, 3:4], list(Name = MyDF$Name, Year = MyDF$Year), sum)
Name Year x y 1 ab 2001 25 5 2 ab 2002 12 8 3 dv 2002 13 17 4 ab 2003 7 10 5 dv 2003 1 15 See ?aggregate for more information. HTH, Marc Schwartz
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Matt Crawford wrote:
I am having trouble doing the following. I have a data.frame like this, where x and y are a variable that I want to do calculations on: Name Year x y ab 2001 15 3 ab 2001 10 2 ab 2002 12 8 ab 2003 7 10 dv 2002 10 15 dv 2002 3 2 dv 2003 1 15 Before I do all the other things I need to do with this data, I need to summarize or collapse the data by name and year. I've found that I can do things like nameyear<-interaction(name,year) dataframe$nameyear<-nameyear tapply(dataframe$x,dataframe$nameyear,sum) tapply(dataframe$y,dataframe$nameyear,sum) and then bind those together. But my problem is that I need to somehow retain the original Names in my collapsed dataset, so that later I can do analyses with the Name factors. All I can think of is something like tapply(dataframe$Name,dataframe$nameyear, somefunction?) but nothing seems to work. I'm actually trying to convert a SAS program, and I can't get out of that mindset. There, it's a simple Proc Means, By Name Year. Thanks for any help or suggestions on the right way to go about this.
mydata <- data.frame(
Name = c("ab","ab","ab","ab","dv","dv","dv"),
Year = c(2001,2001,2002,2003,2002,2002,2003),
x = c(15,10,12,7,10,3,1),
y = c(3,2,8,10,15,2,15))
aggregate(mydata[,c("x", "y")],
list(Name = mydata$Name, Year = mydata$Year), sum)
Name Year x y
1 ab 2001 25 5
2 ab 2002 12 8
3 dv 2002 13 17
4 ab 2003 7 10
5 dv 2003 1 15
?aggregate
Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894
The summaryBy function in the doBy package might help you: summaryBy(x+y~Year, data=..., FUN=c(mean,var)) Best regards S??ren ________________________________ Fra: r-help-bounces at stat.math.ethz.ch p?? vegne af Matt Crawford Sendt: fr 24-02-2006 17:18 Til: r-help at stat.math.ethz.ch Emne: [R] Summarize by two-column factor, retaining original factors I am having trouble doing the following. I have a data.frame like this, where x and y are a variable that I want to do calculations on: Name Year x y ab 2001 15 3 ab 2001 10 2 ab 2002 12 8 ab 2003 7 10 dv 2002 10 15 dv 2002 3 2 dv 2003 1 15 Before I do all the other things I need to do with this data, I need to summarize or collapse the data by name and year. I've found that I can do things like nameyear<-interaction(name,year) dataframe$nameyear<-nameyear tapply(dataframe$x,dataframe$nameyear,sum) tapply(dataframe$y,dataframe$nameyear,sum) and then bind those together. But my problem is that I need to somehow retain the original Names in my collapsed dataset, so that later I can do analyses with the Name factors. All I can think of is something like tapply(dataframe$Name,dataframe$nameyear, somefunction?) but nothing seems to work. I'm actually trying to convert a SAS program, and I can't get out of that mindset. There, it's a simple Proc Means, By Name Year. Thanks for any help or suggestions on the right way to go about this. Matt Crawford ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html