Message-ID: <CAJOiR6b=0A8+OBTxwb=B-WOoKJOajQoYC_ARNv=-R5jnbMseMg@mail.gmail.com>
Date: 2016-12-03T15:40:48Z
From: Val
Subject: data
Hi all,
I am trying to read and summarize a big data frame( >10M records)
Here is the sample of my data
state,city,x
1,12,100
1,12,100
1,12,200
1,13,200
1,13,100
1,13,100
1,14,200
2,21,200
2,21,200
2,21,100
2,23,100
2,23,200
2,34,200
2,34,100
2,35,100
I want get the total count by state, and the the number of cities
by state. The x variable is either 100 or 200 and count each
The result should look like as follows.
state,city,count,100's,200's
1,3,7,4,3
2,4,8,4,4
At the present I am doing it in several steps and taking too long
Is there an efficient way of doing this?