Hi, I just want to do a follow up this very simple fix/correction/speedup/cleanup of the base::which() function. Here is a diff: diff src/library/base/R/which.R which.R 21c21 < wh <- seq_along(x)[ll <- x & !is.na(x)] ---
wh <- seq_along(x)[x & !is.na(x)]
25c25 < names(wh) <- names(x)[ll] ---
names(wh) <- names(x)[wh]
FYI, the 'll' variable is not used elsewhere. I've been going through this modifications several times and I cannot see any side effects. Could someone of R core please commit this? BTW, when one report diff:s, do you prefer to get it with or without context information, e.g. -C 3? /Henrik
On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
On Thu, 10 Jul 2008, Henrik Bengtsson wrote:
Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*.
The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck
CURRENT CODE:
which <- function(x, arr.ind = FALSE)
{
if(!is.logical(x))
stop("argument to 'which' is not logical")
wh <- seq_along(x)[ll <- x & !is.na(x)]
m <- length(wh)
dl <- dim(x)
if (is.null(dl) || !arr.ind) {
names(wh) <- names(x)[ll]
}
...
wh;
}
SUGGESTED CODE: (Remove 'll' and use 'wh')
which2 <- function(x, arr.ind = FALSE)
{
if(!is.logical(x))
stop("argument to 'which' is not logical")
wh <- seq_along(x)[x & !is.na(x)]
m <- length(wh)
dl <- dim(x)
if (is.null(dl) || !arr.ind) {
names(wh) <- names(x)[wh]
}
...
wh;
}
That's all.
BENCHMARKING:
# To measure both in same environment
which1 <- base::which;
environment(which1) <- globalenv(); # Needed?
N <- 1e6;
set.seed(0xbeef);
x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
names(x) <- seq_along(x);
B <- 10;
t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
stopifnot(identical(idxs1, idxs2));
print(t1/t2);
# Fair benchmarking
t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
print(t1/t2);
## user system elapsed
## 1.283186 1.052632 1.250000
You get similar results if you put for loop outside the system.time()
call (and sum up the timings).
Cheers
Henrik
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Charles C. Berry (858) 534-2098
Dept of Family/Preventive
Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901