which() vs. just logical selection in df
Hi Bert, Thank you very much! I was unaware that .Internal() referred to C code. I figured out the difference. which() dimensions the object returned to be only the relevant records first. Logical indexing dimensions last.
length(index1<-dat$gender2=="other")
[1] 2000000
length(index2<-which(index1))
[1] 666667 length(dat[index1,]) [1] 666667 length(dat[index2,]) [1] 666667 microbenchmark(index1<-dat$gender2=="other", times=100L) # 2e6 records, ~ 13ms. microbenchmark(index2<-which(index1), times=100L) # Extra time for which() ~ 5ms. microbenchmark(dat[index1,], times=100L) # Time to return just TRUE records using the whole 2e6 index. ~99ms microbenchmark(dat[index2,], times=100L) # Time to return all records from shorter index ~64ms. Cheers, Keith
On Wed, Oct 14, 2020 at 4:42 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
Inline. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Oct 14, 2020 at 3:23 PM 1/k^c <kchamberln at gmail.com> wrote:
Is which() invoking c-level code by chance, making it slightly faster on average?
You do not need to ask such questions. R is open source, so just look!
which
function (x, arr.ind = FALSE, useNames = TRUE)
{
wh <- .Internal(which(x)) ## C code
if (arr.ind && !is.null(d <- dim(x)))
arrayInd(wh, d, dimnames(x), useNames = useNames)
else wh
}
<bytecode: 0x7fcdba0b8e80>
<environment: namespace:base>