Skip to content
Back to formatted view

Raw Message

Message-ID: <CAJjKGBTunk+B-n5tpGvs6HGjFfOJG4KjfdchrJM77bdPZPQV1w@mail.gmail.com>
Date: 2020-10-15T02:23:37Z
From: kMan
Subject: which() vs. just logical selection in df
In-Reply-To: <CAGxFJbQ=m+XtPFwzV+8uhn99amunNqmr48YHOr0hU=w0zjFKig@mail.gmail.com>

Hi Bert,

Thank you very much! I was unaware that .Internal() referred to C code.

I figured out the difference. which() dimensions the object returned
to be only the relevant records first. Logical indexing dimensions
last.

> length(index1<-dat$gender2=="other")
[1] 2000000
> length(index2<-which(index1))
[1] 666667
length(dat[index1,])
[1] 666667
length(dat[index2,])
[1] 666667

microbenchmark(index1<-dat$gender2=="other", times=100L) # 2e6 records, ~ 13ms.
microbenchmark(index2<-which(index1), times=100L) # Extra time for
which() ~ 5ms.
microbenchmark(dat[index1,], times=100L) # Time to return just TRUE
records using the whole 2e6 index. ~99ms
microbenchmark(dat[index2,], times=100L) # Time to return all records
from shorter index ~64ms.

Cheers,
Keith


On Wed, Oct 14, 2020 at 4:42 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
>
> Inline.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Oct 14, 2020 at 3:23 PM 1/k^c <kchamberln at gmail.com> wrote:
>
>> Is which() invoking c-level code by chance, making it slightly faster
>> on average?
>
>
> You do not need to ask such questions. R is open source, so just look!
>
> > which
> function (x, arr.ind = FALSE, useNames = TRUE)
> {
>     wh <- .Internal(which(x))   ## C code
>     if (arr.ind && !is.null(d <- dim(x)))
>         arrayInd(wh, d, dimnames(x), useNames = useNames)
>     else wh
> }
> <bytecode: 0x7fcdba0b8e80>
> <environment: namespace:base>