Prev 55324 / 63424 Next

Subsetting the "ROW"s of an object

Michael Lawrence

Fri, Jun 8, 2018 1:56 PM

Actually, it's sort of the opposite. Everything becomes a sequence of
integers internally, even when the argument is missing. So the same
amount of work is done, basically. ALTREP will let us improve this
sort of thing.

Michael

On Fri, Jun 8, 2018 at 1:49 PM, Hadley Wickham <h.wickham at gmail.com> wrote:

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]      4      4      4 100000
i <- c(1, 3)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expression                    min     mean      max
#>   <chr>                    <bch:tm> <bch:tm> <bch:tm>
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]               41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <ccberry at ucsd.edu> wrote:

On Jun 8, 2018, at 11:52 AM, Hadley Wickham <h.wickham at gmail.com> wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccberry at ucsd.edu> wrote:

On Jun 8, 2018, at 10:37 AM, Herv? Pag?s <hpages at fredhutch.org> wrote:

Also the TRUEs cause problems if some dimensions are 0:

matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]

Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
  (subscript) logical subscript too long

OK. But this is easy enough to handle.

H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley


AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':

subset_ROW4 <-
    function(x, i, useLiteral=FALSE)
{
   literal <- quote(x[i,,,,drop=FALSE])
   mc <- quote(x[i])
   nd <- max(1L, length(dim(x)))
   mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
   mc[["drop"]] <- FALSE
   if (useLiteral)
       eval(literal)
   else
       eval(mc)
}

I get identical times with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
 arr[i, TRUE, TRUE, TRUE],
 arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expression        min    mean   median      max  n_gc
#>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
#> 1 arr[i, TRUE,?   7.4?s  10.9?s  10.66?s   1.22ms     2
#> 2 arr[i, , , ]   7.06?s   8.8?s   7.85?s 538.09?s     2

So not a huge difference, but it's there.


Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.

But with subset_ROW4 I see no consistent difference.

In this example, it runs faster on average using `eval(mc)' to return the result:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length=10,by=100)
bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]

# A tibble: 2 x 8
  expression                      min     mean   median      max `itr/sec` mem_alloc  n_gc
  <chr>                      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
1 subset_ROW4(arr, i, FALSE)   28.9?s   34.9?s   32.1?s   1.36ms    28686.    5.05KB     5
2 subset_ROW4(arr, i, TRUE)    28.9?s     35?s   32.4?s 875.11?s    28572.    5.05KB     5

And on subsequent reps the lead switches back and forth.


Chuck



--
http://hadley.nz

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Thread (20 messages)

Hadley Wickham Subsetting the "ROW"s of an object Jun 8 Iñaki Ucar Subsetting the "ROW"s of an object Jun 8 Iñaki Ucar Subsetting the "ROW"s of an object Jun 8 Michael Lawrence Subsetting the "ROW"s of an object Jun 8 Charles C. Berry Subsetting the "ROW"s of an object Jun 8 Hadley Wickham Subsetting the "ROW"s of an object Jun 8 Hervé Pagès Subsetting the "ROW"s of an object Jun 8 Hervé Pagès Subsetting the "ROW"s of an object Jun 8 Hervé Pagès Subsetting the "ROW"s of an object Jun 8 Charles C. Berry Subsetting the "ROW"s of an object Jun 8 Hadley Wickham Subsetting the "ROW"s of an object Jun 8 Hervé Pagès Subsetting the "ROW"s of an object Jun 8 Charles C. Berry Subsetting the "ROW"s of an object Jun 8 Hadley Wickham Subsetting the "ROW"s of an object Jun 8 Michael Lawrence Subsetting the "ROW"s of an object Jun 8 Hervé Pagès Subsetting the "ROW"s of an object Jun 8 Charles C. Berry Subsetting the "ROW"s of an object Jun 8 Hadley Wickham Subsetting the "ROW"s of an object Jun 8 Charles C. Berry Subsetting the "ROW"s of an object Jun 8 Hervé Pagès Subsetting the "ROW"s of an object Jun 8