Skip to content
Prev 308248 / 398503 Next

uniq -c

Note that the relative speeds of these, which all use basically the same run-length-encoding
algorithm, depend on the nature of the dataset.  I made a million row data.frame with 10,000
unique users, 26 unique countries, and 6 unique languages with c. 3/4 million unique
rows.  Then the times for methods 1, 2, and 3 were 0.7, 6.2, and 10.5 seconds,
respectively.  With a million row data.frame with 100, 10, and 4 unique users, countries,
and languages, with 4000 unique rows, the times were 0.3, 1.4, and 0.7.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com