Message-ID: <1165968351.4912.1.camel@localhost.localdomain>
Date: 2006-12-13T00:05:51Z
From: Marc Schwartz
Subject: How to sum one column in a data frame keyed on other columns
In-Reply-To: <e57d5aec0612121534p4fe5699bv38f82fa9f8dadf1b@mail.gmail.com>
On Tue, 2006-12-12 at 15:34 -0800, George Nachman wrote:
> I have a data frame that looks like this:
>
> url time somethingirrelevant visits
> www.foo.com 1:00 xxx 100
> www.foo.com 1:00 yyy 50
> www.foo.com 2:00 xyz 25
> www.bar.com 1:00 xxx 200
> www.bar.com 1:00 zzz 200
> www.foo.com 2:00 xxx 500
>
> I'd like to write some code that takes this as input and outputs
> something like this:
>
> url time total_vists
> www.foo.com 1:00 150
> www.foo.com 2:00 525
> www.bar.com 1:00 400
>
> In other words, I need to calculate the sum of visits for each unique
> tuple of (url,time).
>
> I can do it with this code, but it's very slow, and doesn't seem like
> the right approach:
>
> keys = list()
> getkey = function(m,cols,index) { paste(m[index,cols],collapse=",") }
> for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
> for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
> keys[[getkey(data,1:2,i)]] + data[i,4] }
>
> I'm sure there's a more functional-programming approach to this
> problem! Any ideas?
See ?aggregate
If your dataframe is called 'DF':
> aggregate(DF$visits, list(DF$url, DF$time), sum)
Group.1 Group.2 x
1 www.bar.com 1:00 400
2 www.foo.com 1:00 150
3 www.foo.com 2:00 525
HTH,
Marc Schwartz