reason for odd timings

5 messages · Peter Langfelder, Steve Martin, John C Nash +1 more

Original

1

5

John C Nash

Fri, Jan 21, 2022 5:51 PM #

Occasionally I run some rather trivial timings to get an idea of what might
be the best way to compute some quantities.

The program below gave timings for sums of squares of 100 elements much greater
than those for 1000, which seems surprising. Does anyone know the cause of this?

This isn't holding up my work. Just causing some head scratching.

JN

n  	  t(forloop) : ratio 	  t(sum) : ratio 	 t(crossprod) 	 all.equal
100 	 38719.15  :  1.766851 	 13421.12  :  0.6124391 	 21914.21 	 TRUE
1000 	 44722.71  :  20.98348 	 3093.94  :  1.451648 	 2131.33 	 TRUE
10000 	 420149.9  :  42.10269 	 27341.6  :  2.739867 	 9979.17 	 TRUE
1e+05 	 4070469  :  39.89473 	 343293.5  :  3.364625 	 102030.2 	 TRUE
1e+06 	 42293696  :  33.27684 	 3605866  :  2.837109 	 1270965 	 TRUE
1e+07 	 408123066  :  29.20882 	 35415106  :  2.534612 	 13972596 	 TRUE

# crossprod timer
library(microbenchmark)
suml<-function(vv) {
    ss<-0.0
    for (i in 1:length(vv)) {ss<-ss+vv[i]^2}
    ss
}
sums<-function(vv) {
  ss<-sum(vv^2)
  ss
}
sumc<-function(vv) {
  ss<-as.numeric(crossprod(vv))
  ss
}
ll <- c(100, 1000, 10000, 100000, 1000000, 10000000)
cat(" n  \t  t(forloop) : ratio \t  t(sum) : ratio \t t(crossprod) \t all.equal \n")
for (nn in ll ){
   set.seed(1234)
   vv <- runif(nn)
   tsuml<-microbenchmark(sl<-suml(vv), unit="us")
   tsums<-microbenchmark(ss<-sums(vv), unit="us")
   tsumc<-microbenchmark(sc<-sumc(vv), unit="us")
   ml<-mean(tsuml$time)
   ms<-mean(tsums$time)
   mc<-mean(tsumc$time)
   cat(nn,"\t",ml," : ",ml/mc,"\t",ms," : ",ms/mc,"\t",mc,"\t",all.equal(sl, ss, sc),"\n")
}

Peter Langfelder

Fri, Jan 21, 2022 8:27 PM #

I think R spends a little bit of time compiling the functions into byte
code the first time you call the function. The hint is that the oddity
seems to go away when running the code repeatedly. The first timing for the
first three values of ll returns this:

100 65347.45  :  2.491943 24329.94  :  0.9277918 26223.49 TRUE
1000 71387.79  :  11.49891 4619.35  :  0.74407 6208.22 TRUE
10000 711115.8  :  16.82563 61732.79  :  1.460653 42263.83 TRUE

the subsequent runs return this:

100 8349.68  :  3.125699 1821.28  :  0.6817954 2671.3 TRUE
1000 72191.88  :  11.8359 4712.81  :  0.7726678 6099.4 TRUE
10000 706826.9  :  15.95787 61574.9  :  1.390162 44293.32 TRUE

100 8390.46  :  3.159023 1815.71  :  0.683618 2656.03 TRUE
1000 93731.96  :  15.3112 4630.55  :  0.7564046 6121.79 TRUE
10000 809345.8  :  18.75048 61903.45  :  1.434145 43164 TRUE

Peter

On Fri, Jan 21, 2022 at 5:51 PM J C Nash <profjcnash at gmail.com> wrote:

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Steve Martin

Fri, Jan 21, 2022 8:38 PM #

Just to add a bit more, stripping out most of your test shows that
there is one iteration (the 2nd one) that takes a lot longer than the
others because the sums() function gets bytecode compiled.

library(microbenchmark)

sums <- function(vv) {
  ss <- sum(vv^2)
  ss
}

sums2 <- compiler::cmpfun(sums)

x <- runif(100)

head(as.data.frame(microbenchmark(sums(x), sums2(x))))
          expr    time
1  sums(x)   29455
2  sums(x) 3683091
3 sums2(x)    7108
4  sums(x)    4305
5  sums(x)    2733
6  sums(x)    2797

The paragraph on JIT in the details of ?compiler::compile explains
that this is the default behavior.

Steve

On Fri, 21 Jan 2022 at 20:51, J C Nash <profjcnash at gmail.com> wrote:

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

John C Nash

Sat, Jan 22, 2022 9:18 AM #

Many thanks for quick and reasonable explanation.

Now the issue is to figure out when it is worthwhile to recommend/use sums2.

Best, JN

On 2022-01-21 23:38, Steve Martin wrote:

Just to add a bit more, stripping out most of your test shows that
there is one iteration (the 2nd one) that takes a lot longer than the
others because the sums() function gets bytecode compiled.

library(microbenchmark)

sums <- function(vv) {
   ss <- sum(vv^2)
   ss
}

sums2 <- compiler::cmpfun(sums)

x <- runif(100)

head(as.data.frame(microbenchmark(sums(x), sums2(x))))
           expr    time
1  sums(x)   29455
2  sums(x) 3683091
3 sums2(x)    7108
4  sums(x)    4305
5  sums(x)    2733
6  sums(x)    2797

The paragraph on JIT in the details of ?compiler::compile explains
that this is the default behavior.

Steve

On Fri, 21 Jan 2022 at 20:51, J C Nash <profjcnash at gmail.com> wrote:

Occasionally I run some rather trivial timings to get an idea of what might
be the best way to compute some quantities.

The program below gave timings for sums of squares of 100 elements much greater
than those for 1000, which seems surprising. Does anyone know the cause of this?

This isn't holding up my work. Just causing some head scratching.

JN

source("sstimer.R")

  n        t(forloop) : ratio      t(sum) : ratio         t(crossprod)    all.equal
100      38719.15  :  1.766851   13421.12  :  0.6124391          21914.21        TRUE
1000     44722.71  :  20.98348   3093.94  :  1.451648    2131.33         TRUE
10000    420149.9  :  42.10269   27341.6  :  2.739867    9979.17         TRUE
1e+05    4070469  :  39.89473    343293.5  :  3.364625   102030.2        TRUE
1e+06    42293696  :  33.27684   3605866  :  2.837109    1270965         TRUE
1e+07    408123066  :  29.20882          35415106  :  2.534612   13972596        TRUE

# crossprod timer
library(microbenchmark)
suml<-function(vv) {
     ss<-0.0
     for (i in 1:length(vv)) {ss<-ss+vv[i]^2}
     ss
}
sums<-function(vv) {
   ss<-sum(vv^2)
   ss
}
sumc<-function(vv) {
   ss<-as.numeric(crossprod(vv))
   ss
}
ll <- c(100, 1000, 10000, 100000, 1000000, 10000000)
cat(" n  \t  t(forloop) : ratio \t  t(sum) : ratio \t t(crossprod) \t all.equal \n")
for (nn in ll ){
    set.seed(1234)
    vv <- runif(nn)
    tsuml<-microbenchmark(sl<-suml(vv), unit="us")
    tsums<-microbenchmark(ss<-sums(vv), unit="us")
    tsumc<-microbenchmark(sc<-sumc(vv), unit="us")
    ml<-mean(tsuml$time)
    ms<-mean(tsums$time)
    mc<-mean(tsumc$time)
    cat(nn,"\t",ml," : ",ml/mc,"\t",ms," : ",ms/mc,"\t",mc,"\t",all.equal(sl, ss, sc),"\n")
}

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Sat, Jan 22, 2022 1:44 PM #

FWIW crossprod timings are likely not generalizable, because unlike sum() they depend on the BLAS implementation used so it's not uncommon to see significant differences. For simple things like this the parallel BLAS can actually be slower than the built-in one (since the computation is limited by memory throughput).

Also if you actually want to measure the difference, don't use functions as they have significant overhead (for short evals), just use the expressions, e.g.:

+    set.seed(1234)
+    vv <- runif(nn)
+    rbind(microbenchmark(sum(vv^2)),
+          microbenchmark(c(crossprod(vv))))
+ })

expr       min        lq        mean     median         uq       max neval
1         sum(vv^2)     0.703     0.731     1.09781     0.7665     0.9050    10.356   100
2  c(crossprod(vv))     2.317     2.334     3.39735     2.4025     2.5390    92.110   100
3         sum(vv^2)     3.200     3.261     3.51941     3.3580     3.5425    13.109   100
4  c(crossprod(vv))     3.919     3.973     4.29774     4.0145     4.1200    25.054   100
5         sum(vv^2)    57.461    60.910    69.12497    67.4455    75.9360   102.943   100
6  c(crossprod(vv))    18.321    18.468    19.32948    18.5735    18.6795    82.353   100
7         sum(vv^2)   286.488   566.928   607.11935   596.7610   631.5625  4955.724   100
8  c(crossprod(vv))   161.757   163.338   167.62892   165.1660   165.8425   327.439   100
9         sum(vv^2)  3221.395  3694.859  4604.01607  3867.5830  4050.5715  8565.486   100
10 c(crossprod(vv))  1898.899  1922.386  1955.75945  1941.1760  1959.3340  2392.512   100
11        sum(vv^2) 43634.025 44486.356 45493.26093 44850.8460 45170.9870 97560.633   100
12 c(crossprod(vv)) 19379.950 19631.935 19817.88547 19770.3475 19922.6035 21512.017   100

That said, I suspect that we used more time talking about this than will be ever saved by picking one method over the other ;).

Cheers,
Simon

On Jan 23, 2022, at 6:18 AM, J C Nash <profjcnash at gmail.com> wrote:

Many thanks for quick and reasonable explanation.

Now the issue is to figure out when it is worthwhile to recommend/use sums2.

Best, JN


On 2022-01-21 23:38, Steve Martin wrote:

Just to add a bit more, stripping out most of your test shows that
there is one iteration (the 2nd one) that takes a lot longer than the
others because the sums() function gets bytecode compiled.
library(microbenchmark)
sums <- function(vv) {
  ss <- sum(vv^2)
  ss
}
sums2 <- compiler::cmpfun(sums)
x <- runif(100)
head(as.data.frame(microbenchmark(sums(x), sums2(x))))
          expr    time
1  sums(x)   29455
2  sums(x) 3683091
3 sums2(x)    7108
4  sums(x)    4305
5  sums(x)    2733
6  sums(x)    2797
The paragraph on JIT in the details of ?compiler::compile explains
that this is the default behavior.
Steve
On Fri, 21 Jan 2022 at 20:51, J C Nash <profjcnash at gmail.com> wrote:

Occasionally I run some rather trivial timings to get an idea of what might
be the best way to compute some quantities.

The program below gave timings for sums of squares of 100 elements much greater
than those for 1000, which seems surprising. Does anyone know the cause of this?

This isn't holding up my work. Just causing some head scratching.

JN

source("sstimer.R")

 n        t(forloop) : ratio      t(sum) : ratio         t(crossprod)    all.equal
100      38719.15  :  1.766851   13421.12  :  0.6124391          21914.21        TRUE
1000     44722.71  :  20.98348   3093.94  :  1.451648    2131.33         TRUE
10000    420149.9  :  42.10269   27341.6  :  2.739867    9979.17         TRUE
1e+05    4070469  :  39.89473    343293.5  :  3.364625   102030.2        TRUE
1e+06    42293696  :  33.27684   3605866  :  2.837109    1270965         TRUE
1e+07    408123066  :  29.20882          35415106  :  2.534612   13972596        TRUE

# crossprod timer
library(microbenchmark)
suml<-function(vv) {
    ss<-0.0
    for (i in 1:length(vv)) {ss<-ss+vv[i]^2}
    ss
}
sums<-function(vv) {
  ss<-sum(vv^2)
  ss
}
sumc<-function(vv) {
  ss<-as.numeric(crossprod(vv))
  ss
}
ll <- c(100, 1000, 10000, 100000, 1000000, 10000000)
cat(" n  \t  t(forloop) : ratio \t  t(sum) : ratio \t t(crossprod) \t all.equal \n")
for (nn in ll ){
   set.seed(1234)
   vv <- runif(nn)
   tsuml<-microbenchmark(sl<-suml(vv), unit="us")
   tsums<-microbenchmark(ss<-sums(vv), unit="us")
   tsumc<-microbenchmark(sc<-sumc(vv), unit="us")
   ml<-mean(tsuml$time)
   ms<-mean(tsums$time)
   mc<-mean(tsumc$time)
   cat(nn,"\t",ml," : ",ml/mc,"\t",ms," : ",ms/mc,"\t",mc,"\t",all.equal(sl, ss, sc),"\n")
}

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel