As promised, below are some timing results using Jeroen's recent 4.9.3-based 64bit toolchain. I will hopefully eventually blog on this in more detail, but there are two suites of tests, one which focuses on BLAS-related functionality and one that focuses on non-BLAS related math functionality. I've hosted the reproducible test code and full results (quartiles, mean, SD, and CV) at [1] in case anyone want to do any hypothesis testing. My immediate takeaways, there is no substitute for a fast BLAS if you are doing any matrix operations. I'll probably post a suggested patch to more easily allow building R on Windows with a pre-compiled OpenBLAS. This also tempts me very much to tinker under the hood and see what would be necessary to allow building R on Windows using an optimized LAPACK as well (Both ATLAS and OpenBLAS allow for building an optimized LAPACK). I found some weird results when testing link-time-optimization (LTO) in the non-BLAS section: using tune=native, it was the fastest; using arch=native, it was the slowest, though there really isn't much difference between tune native with and without LTO. For BLAS-related calls it was slower with than without (although a sample size of 25 may be too small). As compiling with LTO means that some packages may misbehave when compiled from source and thus require binary installs (stringi and dplyr are two that I have found) I'm unsure whether or not changes should be made to the makefiles to make it a simple call from Mkrules.local, or if it is better to post instructions on-line somewhere (or to this list) for the enterprising adventurer to try on his or her own. Your collective thoughts? There are seven builds tested against each other. The test platform was an i7-3740QM @ 2.7Ghz with 8MB RAM; Win7 64. The version of R tested (and all fully passed make check-devel and make check-recommended) was R-devel_2015-09-10, and the units are in milliseconds. The descriptor strings should be self-explanatory, Ref means reference BLAS, OPB is OpenBLAS version 0.2.14, TG is mtune=generic, TN is mtune=native, and AN is march=native. The results are in the following order. I apologize if the formatting gets messed up. * 463-SJLJ-Ref-TG * 493-SEH-POSIX-Ref-TG * 493-SEH-POSIX-OPB-TG * 493-SEH-POSIX-OPB-TN * 493-SEH-POSIX-OPB-AN * 493-SEH-POSIX-OPB-TN-LTO * 493-SEH-POSIX-OPB-AN-LTO [1] <http://www.avrahamadler.com/SpeedTests2015%20v4.txt> Thank you, Avi BLAS-related: sort(c(as.vector(A), as.vector(B))) 410.553 478.423 407.874 405.382 406.139 406.254 407.029 det(A) 222.413 230.527 27.056 28.022 26.704 28.900 28.956 A %*% B 680.316 661.466 40.803 42.848 38.918 40.602 37.693 t(A) %*% B 692.379 668.152 52.604 52.422 53.991 52.072 49.743 crossprod(A, B) 1,191.143 1,198.826 39.234 35.717 39.510 35.319 36.704 solve(A) 1,080.882 1,139.137 82.553 83.621 81.876 89.614 89.788 solve(A, t(B)) 1,501.146 1,566.338 90.099 92.029 90.027 101.827 110.558 solve(B) 1,074.366 1,091.797 99.424 98.131 99.472 106.990 108.083 chol(A) 203.694 277.394 15.455 16.834 16.306 19.880 18.918 chol(B, pivot = TRUE) 4.436 9.121 4.856 5.015 4.887 5.990 5.915 qr(A, LAPACK = TRUE) 694.455 697.853 132.854 133.123 131.401 133.232 133.786 svd(A) 3,577.719 3,527.317 623.759 630.878 617.791 626.818 632.760 eigen(A, symmetric = TRUE) 1,557.346 1,593.283 290.351 296.038 290.124 293.611 293.482 eigen(A, symmetric = FALSE) 5,938.559 5,710.295 1,361.771 1,409.270 1,440.704 1,425.261 1,488.135 eigen(B, symmetric = FALSE) 6,703.844 6,460.657 4,820.368 4,726.327 4,799.318 4,686.093 4,924.717 lu(A) 240.613 293.879 45.978 47.799 45.017 47.169 47.200 fft(A) 161.925 167.645 161.838 162.145 157.566 161.197 158.356 Hilbert(3000) 258.187 462.391 254.183 255.146 254.994 252.812 257.502 toeplitz(A[1:500, 1]) 6.423 12.544 6.595 6.569 6.701 6.618 7.179 princomp(A) 2,961.546 2,977.561 471.074 479.347 469.696 467.749 474.495 Non-BLAS related: A + 2 3.340 3.318 3.325 3.355 3.398 3.316 3.586 A - 2 3.440 3.410 3.412 3.379 3.501 3.375 3.658 A * 2 3.373 3.404 3.409 3.411 3.474 3.325 3.654 A/2 5.232 3.763 3.735 3.747 3.893 3.796 4.033 A * 0.5 3.403 3.371 3.391 3.394 3.476 3.360 3.632 A^2 3.341 3.396 3.381 3.374 3.459 3.327 3.673 sqrt(A[1:10000]) 0.192 0.189 0.188 0.178 0.890 0.174 0.884 sin(A[1:10000]) 0.641 0.638 0.635 0.611 1.251 0.612 1.249 A + B 1.807 1.834 1.839 1.843 1.833 1.807 1.843 A - B 1.787 1.842 1.835 1.850 1.829 1.801 1.848 A * B 1.797 1.824 1.854 1.858 1.828 1.828 1.842 A/B 5.452 2.817 2.811 2.820 2.816 2.835 2.906 A[1:100000]%%B[1:100000] 3.999 4.439 4.446 4.445 4.452 4.517 4.445 A[1:100000]%/%B[1:100000] 3.660 4.056 4.079 4.056 4.075 4.056 4.056
Some time tests with new toolchain
1 message · Avraham Adler