uniq -c
On 16/10/2012 12:29 PM, Sam Steingold wrote:
* R. Michael Weylandt <zvpunry.jrlynaqg at tznvy.pbz> [2012-10-16 16:19:27 +0100]: Have you looked at using table() directly? If I understand what you want correctly something like: table(do.call(paste, x))
I wished to avoid paste (I will have to re-split later, so it will be a performance nightmare).
Also, if you take a look at the development version of R, changes are being put in place to allow much larger data sets.
xtabs(), although dog slow, would have footed the bill nicely: --8<---------------cut here---------------start------------->8---
x <- data.frame(a=1:32,b=1:32,c=1:32,d=1:32,e=1:32) system.time(subset(as.data.frame(xtabs( ~. , x )), Freq != 0 ))
user system elapsed 12.788 4.288 17.224 --8<---------------cut here---------------end--------------->8---
you should not need "much larger data sets" for this. x is sorted.
The problem is that xtabs() and by() and related functions are designed for the case where all combinations of all factors exist. If you have a dataset where only a few exist, you could use sparseby() from the reshape package. Syntax would be sparseby(data=x, INDICES=x, FUN=nrow) if you wanted a dataframe giving counts. I just tried it, and on your two examples it gives a warning about coercing a list to a logical vector; I guess all(list(TRUE, TRUE)) was allowed when I wrote it, but isn't any more. I'll send a patch to the maintainer. Duncan Murdoch