Skip to content

Summarize by two-column factor, retaining original factors

5 messages · Matt Crawford, Marc Schwartz (via MN), Gabor Grothendieck +2 more

#
I am having trouble doing the following.  I have a data.frame like
this, where x and y are a variable that I want to do calculations on:

Name Year x y
ab   2001  15 3
ab   2001  10 2
ab   2002  12 8
ab   2003  7 10
dv   2002  10 15
dv   2002  3 2
dv   2003  1 15

Before I do all the other things I need to do with this data, I need
to summarize or collapse the data by name and year.  I've found that I
can do things like
nameyear<-interaction(name,year)
dataframe$nameyear<-nameyear
tapply(dataframe$x,dataframe$nameyear,sum)
tapply(dataframe$y,dataframe$nameyear,sum)
and then bind those together.

But my problem is that I need to somehow retain the original Names in
my collapsed dataset, so that later I can do analyses with the Name
factors.  All I can think of is something like
tapply(dataframe$Name,dataframe$nameyear, somefunction?)
but nothing seems to work.

I'm actually trying to convert a SAS program, and I can't get out of
that mindset.  There, it's a simple Proc Means, By Name Year.

Thanks for any help or suggestions on the right way to go about this.

Matt Crawford
#
On Fri, 2006-02-24 at 08:18 -0800, Matt Crawford wrote:
Matt,

Just use aggregate():
Name Year  x  y
1   ab 2001 25  5
2   ab 2002 12  8
3   dv 2002 13 17
4   ab 2003  7 10
5   dv 2003  1 15


See ?aggregate for more information.

HTH,

Marc Schwartz
#
Or even

aggregate(DF[3:4], DF[1:2], sum)
On 2/24/06, Marc Schwartz (via MN) <mschwartz at mn.rr.com> wrote:
#
Matt Crawford wrote:
mydata <- data.frame(
            Name = c("ab","ab","ab","ab","dv","dv","dv"),
            Year = c(2001,2001,2002,2003,2002,2002,2003),
               x = c(15,10,12,7,10,3,1),
               y = c(3,2,8,10,15,2,15))

aggregate(mydata[,c("x", "y")],
           list(Name = mydata$Name, Year = mydata$Year), sum)

   Name Year  x  y
1   ab 2001 25  5
2   ab 2002 12  8
3   dv 2002 13 17
4   ab 2003  7 10
5   dv 2003  1 15

?aggregate
#
The summaryBy function in the doBy package might help you:
summaryBy(x+y~Year, data=..., FUN=c(mean,var))
Best regards
S??ren

________________________________

Fra: r-help-bounces at stat.math.ethz.ch p?? vegne af Matt Crawford
Sendt: fr 24-02-2006 17:18
Til: r-help at stat.math.ethz.ch
Emne: [R] Summarize by two-column factor, retaining original factors



I am having trouble doing the following.  I have a data.frame like
this, where x and y are a variable that I want to do calculations on:

Name Year x y
ab   2001  15 3
ab   2001  10 2
ab   2002  12 8
ab   2003  7 10
dv   2002  10 15
dv   2002  3 2
dv   2003  1 15

Before I do all the other things I need to do with this data, I need
to summarize or collapse the data by name and year.  I've found that I
can do things like
nameyear<-interaction(name,year)
dataframe$nameyear<-nameyear
tapply(dataframe$x,dataframe$nameyear,sum)
tapply(dataframe$y,dataframe$nameyear,sum)
and then bind those together.

But my problem is that I need to somehow retain the original Names in
my collapsed dataset, so that later I can do analyses with the Name
factors.  All I can think of is something like
tapply(dataframe$Name,dataframe$nameyear, somefunction?)
but nothing seems to work.

I'm actually trying to convert a SAS program, and I can't get out of
that mindset.  There, it's a simple Proc Means, By Name Year.

Thanks for any help or suggestions on the right way to go about this.

Matt Crawford

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html