Skip to content
Prev 274015 / 398506 Next

SLOW split() function

I tried this:

library(data.table)
    N <- 1000
    T <- N*10
d <- data.table(gp= rep(1:T, rep(N,T)), val=rnorm(N*T), key = 'gp')
dim(d)
[1] 10000000        2

# On my humble 8Gb system,
user  system elapsed
   4.15    0.09    4.27

I wouldn't be surprised if there were a much faster way to do this
operation in data.table since split() is a data frame operation. This
is about as fast as Jim Holtman's suggestion:

system.time(s <- split(seq_len(nrow(d)), d$gp))
   user  system elapsed
   4.15    0.09    4.29

HTH,
Dennis
On Mon, Oct 10, 2011 at 6:01 PM, ivo welch <ivo.welch at gmail.com> wrote: