Skip to content

How to sum one column in a data frame keyed on other columns

4 messages · George Nachman, Simon Blomberg, Marc Schwartz +1 more

#
I have a data frame that looks like this:

url         time somethingirrelevant visits
www.foo.com 1:00 xxx                 100
www.foo.com 1:00 yyy                 50
www.foo.com 2:00 xyz                 25
www.bar.com 1:00 xxx                 200
www.bar.com 1:00 zzz                 200
www.foo.com 2:00 xxx                 500

I'd like to write some code that takes this as input and outputs
something like this:

url         time total_vists
www.foo.com 1:00 150
www.foo.com 2:00 525
www.bar.com 1:00 400

In other words, I need to calculate the sum of visits for each unique
tuple of (url,time).

I can do it with this code, but it's very slow, and doesn't seem like
the right approach:

keys = list()
getkey = function(m,cols,index) { paste(m[index,cols],collapse=",")  }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
keys[[getkey(data,1:2,i)]] + data[i,4] }

I'm sure there's a more functional-programming approach to this
problem! Any ideas?
#
You could look at the reshape package, using sum as the aggregate function.

HTH,

Simon.
George Nachman wrote:

  
    
#
On Tue, 2006-12-12 at 15:34 -0800, George Nachman wrote:
See ?aggregate

If your dataframe is called 'DF':
Group.1 Group.2   x
1 www.bar.com    1:00 400
2 www.foo.com    1:00 150
3 www.foo.com    2:00 525


HTH,

Marc Schwartz