Skip to content
Back to formatted view

Raw Message

Message-ID: <CABdHhvHDFyEBESAFaE99rm5_xgdG08ooCAFFkshCvY86y9RmCg@mail.gmail.com>
Date: 2018-06-08T18:52:01Z
From: Hadley Wickham
Subject: Subsetting the "ROW"s of an object
In-Reply-To: <C43ED250-AD89-4ECC-97B8-038231DB475D@ucsd.edu>

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry at ucsd.edu> wrote:
>
>
>> On Jun 8, 2018, at 10:37 AM, Herv? Pag?s <hpages at fredhutch.org> wrote:
>>
>> Also the TRUEs cause problems if some dimensions are 0:
>>
>>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>    (subscript) logical subscript too long
>
> OK. But this is easy enough to handle.
>
>>
>> H.
>>
>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>> I suspect this will have suboptimal performance since the TRUEs will
>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>> recycling)
>>> Hadley
>
>
> AFAICS, it is not an issue. Taking
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>
> as a test case
>
> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>
> subset_ROW4 <-
>      function(x, i, useLiteral=FALSE)
> {
>     literal <- quote(x[i,,,,drop=FALSE])
>     mc <- quote(x[i])
>     nd <- max(1L, length(dim(x)))
>     mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>     mc[["drop"]] <- FALSE
>     if (useLiteral)
>         eval(literal)
>     else
>         eval(mc)
>  }
>
> I get identical times with
>
> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>
> and with
>
> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expression        min    mean   median      max  n_gc
#>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
#> 1 arr[i, TRUE,?   7.4?s  10.9?s  10.66?s   1.22ms     2
#> 2 arr[i, , , ]   7.06?s   8.8?s   7.85?s 538.09?s     2

So not a huge difference, but it's there.

Hadley


-- 
http://hadley.nz