Message-ID: <a1a6c223-cb75-4f0c-9d39-3062fcc190bf@t8g2000prh.googlegroups.com>
Date: 2011-01-26T10:39:37Z
From: analyst41 at hotmail.com
Subject: hwo to speed up "aggregate"
I have
> df
quantity branch client date name
1 10 1 1 2010-01-01 one
2 20 2 1 2010-01-01 one
3 30 3 2 2010-01-01 two
4 15 4 1 2010-01-01 one
5 10 5 2 2010-01-01 two
6 20 6 3 2010-01-01 three
7 1000 1 1 2011-01-01 one
8 2000 2 1 2011-01-01 one
9 3000 3 2 2011-01-01 two
10 1500 4 1 2011-01-01 one
11 1000 5 2 2011-01-01 two
12 2000 6 3 2011-01-01 three
I want to aggregate away the branch. I followed a suggestion by Gabor
(thanks) and did
> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum)
client date quantity
1 1 2010-01-01 45
2 2 2010-01-01 40
3 3 2010-01-01 20
4 1 2011-01-01 4500
5 2 2011-01-01 4000
6 3 2011-01-01 2000
I want df$name also in the output and did what looked obvious:
> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum)
client date name quantity
1 1 2010-01-01 one 45
2 1 2011-01-01 one 4500
3 3 2010-01-01 three 20
4 3 2011-01-01 three 2000
5 2 2010-01-01 two 40
6 2 2011-01-01 two 4000
It seems to work, but slows down tremendously for a dataframe with
around a 1000 rows.
Could anyone explain what is going on and suggest a way out?
Thanks.