Any interest in "merge" and "by" implementations specifically for so
Hi Tom,
Now, try sorting and using a loop:
idx <- order(i)
xs <- x[idx]
is <- i[idx]
res <- array(NA, 1e6)
idx <- which(diff(is) > 0)
startidx <- c(1, idx+1)
endidx <- c(idx, length(xs))
f1 <- function(x, startidx, endidx, FUN = sum) {
+ for (j in 1:length(res)) {
+ res[j] <- FUN(x[startidx[j]:endidx[j]])
+ }
+ res
+ }
unix.time(res1 <- f1(xs, startidx, endidx))
[1] 6.86 0.00 7.04 NA NA
I wonder how much time the sorting, reordering and creation os startidx and endidx would add to this time?
Done interactively, sorting and indexing seemed fast. Here are some timings:
unix.time({idx <- order(i)
+ xs <- x[idx] + is <- i[idx] + res <- array(NA, 1e6) + idx <- which(diff(is) > 0) + startidx <- c(1, idx+1) + endidx <- c(idx, length(xs)) + }) [1] 1.06 0.00 1.09 NA NA
That looks interesting. Does it only work for specific operating systems and processors? I will give it a try.
No, as far as I know, it works on all operating systems. Also, it gets a little faster if you directly put the sum in the function:
f4 <- function(x, startidx, endidx) {
+ for (j in 1:length(res)) {
+ res[j] <- sum(x[startidx[j]:endidx[j]])
+ }
+ res
+ }
f5 <- cmpfun(f4) unix.time(res5 <- f5(xs, startidx, endidx))
[1] 2.67 0.03 2.95 NA NA - Tom
View this message in context: http://www.nabble.com/Any-interest-in-%22merge%22-and-%22by%22-implementations-specifically-for-sorted-data--tf2009595.html#a5578580 Sent from the R devel forum at Nabble.com.