Skip to content
Back to formatted view

Raw Message

Message-ID: <CAJOiR6b=0A8+OBTxwb=B-WOoKJOajQoYC_ARNv=-R5jnbMseMg@mail.gmail.com>
Date: 2016-12-03T15:40:48Z
From: Val
Subject: data

Hi all,

I am trying to read and summarize  a big data frame( >10M records)

Here is the sample of my data
state,city,x
1,12,100
1,12,100
1,12,200
1,13,200
1,13,100
1,13,100
1,14,200
2,21,200
2,21,200
2,21,100
2,23,100
2,23,200
2,34,200
2,34,100
2,35,100

I want  get  the total count by state, and the  the number of cities
by state. The x variable is either 100 or 200 and count each

The result should look like as follows.

state,city,count,100's,200's
1,3,7,4,3
2,4,8,4,4

At the present I am doing it  in several steps and taking too long

Is there an efficient way of doing this?