SLOW split() function

instead of spliting the entire dataframe, split the indices and then use these to access your data: try 

system.time(s <- split(seq(nrow(d)), d$key))

this should be faster and less memory intensive.  you can then use the indices to access the subset:

result <- lapply(s, function(.indx){
    doSomething <- sum(d$someCol[.indx])
})

Sent from my iPad

SLOW split() function

Thread (11 messages)