I have a data frame that looks like this:
url time somethingirrelevant visits
www.foo.com 1:00 xxx 100
www.foo.com 1:00 yyy 50
www.foo.com 2:00 xyz 25
www.bar.com 1:00 xxx 200
www.bar.com 1:00 zzz 200
www.foo.com 2:00 xxx 500
I'd like to write some code that takes this as input and outputs
something like this:
url time total_vists
www.foo.com 1:00 150
www.foo.com 2:00 525
www.bar.com 1:00 400
In other words, I need to calculate the sum of visits for each unique
tuple of (url,time).
I can do it with this code, but it's very slow, and doesn't seem like
the right approach:
keys = list()
getkey = function(m,cols,index) { paste(m[index,cols],collapse=",") }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
keys[[getkey(data,1:2,i)]] + data[i,4] }
I'm sure there's a more functional-programming approach to this
problem! Any ideas?
How to sum one column in a data frame keyed on other columns
4 messages · George Nachman, Simon Blomberg, Marc Schwartz +1 more
You could look at the reshape package, using sum as the aggregate function. HTH, Simon.
George Nachman wrote:
I have a data frame that looks like this:
url time somethingirrelevant visits
www.foo.com 1:00 xxx 100
www.foo.com 1:00 yyy 50
www.foo.com 2:00 xyz 25
www.bar.com 1:00 xxx 200
www.bar.com 1:00 zzz 200
www.foo.com 2:00 xxx 500
I'd like to write some code that takes this as input and outputs
something like this:
url time total_vists
www.foo.com 1:00 150
www.foo.com 2:00 525
www.bar.com 1:00 400
In other words, I need to calculate the sum of visits for each unique
tuple of (url,time).
I can do it with this code, but it's very slow, and doesn't seem like
the right approach:
keys = list()
getkey = function(m,cols,index) { paste(m[index,cols],collapse=",") }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
keys[[getkey(data,1:2,i)]] + data[i,4] }
I'm sure there's a more functional-programming approach to this
problem! Any ideas?
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. Centre for Resource and Environmental Studies The Australian National University Canberra ACT 0200 Australia T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au F: +61 2 6125 0757 CRICOS Provider # 00120C The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey.
On Tue, 2006-12-12 at 15:34 -0800, George Nachman wrote:
I have a data frame that looks like this:
url time somethingirrelevant visits
www.foo.com 1:00 xxx 100
www.foo.com 1:00 yyy 50
www.foo.com 2:00 xyz 25
www.bar.com 1:00 xxx 200
www.bar.com 1:00 zzz 200
www.foo.com 2:00 xxx 500
I'd like to write some code that takes this as input and outputs
something like this:
url time total_vists
www.foo.com 1:00 150
www.foo.com 2:00 525
www.bar.com 1:00 400
In other words, I need to calculate the sum of visits for each unique
tuple of (url,time).
I can do it with this code, but it's very slow, and doesn't seem like
the right approach:
keys = list()
getkey = function(m,cols,index) { paste(m[index,cols],collapse=",") }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
keys[[getkey(data,1:2,i)]] + data[i,4] }
I'm sure there's a more functional-programming approach to this
problem! Any ideas?
See ?aggregate If your dataframe is called 'DF':
aggregate(DF$visits, list(DF$url, DF$time), sum)
Group.1 Group.2 x 1 www.bar.com 1:00 400 2 www.foo.com 1:00 150 3 www.foo.com 2:00 525 HTH, Marc Schwartz
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20061212/edaa725d/attachment.pl