Message-ID: <140e06f2-202c-4f98-8d65-189fcf061450@gmail.com>
Date: 2024-12-05T13:55:51Z
From: Serguei Sokol
Subject: [R-pkg-devel] Cannot create C code with acceptable performance with respect to internal R command.
In-Reply-To: <AM0PR09MB3843E3FD6F0F9F3847ECEC5898302@AM0PR09MB3843.eurprd09.prod.outlook.com>
Luc,
There can be many reasons explaining the difference in compiled code
performances. Tuning such code to achieve a pick performance is
generally a fine art.
Optimizations techniques can include but are not limited to:
?- SIMD instructions (and memory alignment for their optimal use);
?- instruction level parallelism;
?- unrolling loops;
?- cache level (mis-)hits;
?- multi-thread parallelism;
?- ...
Approaches in optimization are not the same depending on kind of
application: CPU-bound, memory-bound or IO-bound.
Many of this techniques can be directly used (or not) by compiler
depending on chosen options. Are you sure to use the same options and
compiler that were used during R compilation?
And finally, the compared code could be plainly not the same. R can use
BLAS call, e.g. OpenBLAS to multiply two matrices. This latter is
heavily optimized for such operations and can achieve x10 acceleration
compared to plain "naive" BLAS.
The R code you cite can be just the code for a fallback in case no BLAS
was found during R compilation.
Look at what your sessionInfo() says about used BLAS.
Best,
Serguei.
Le 05/12/2024 ? 14:21, Luc De Wilde a ?crit?:
> Dear package developers,
>
> in creating a package lavaanC for use in lavaan, I need to perform some matrix computations involving matrix products and crossproducts. As far as I see I cannot directly call the C code in the R core. So I copied the code in the R core, but the same C/C++ code in a package is 2.5 ? 3 times slower than executed directly in R :
>
> C code in package :
> ? SEXP prod0(SEXP mat1, SEXP mat2) {
> ? ? SEXP u1 = Rf_getAttrib(mat1, R_DimSymbol);
> ? ? int m1 = INTEGER(u1)[0];
> ? ? int n1 = INTEGER(u1)[1];
> ? ? SEXP u2 = Rf_getAttrib(mat2, R_DimSymbol);
> ? ? int m2 = INTEGER(u2)[0];
> ? ? int n2 = INTEGER(u2)[1];
> ? ? if (n1 != m2) Rf_error("matrices not conforming");
> ? ? SEXP retval = PROTECT(Rf_allocMatrix(REALSXP, m1, n2));
> ? ? double* left = REAL(mat1);
> ? ? double* right = REAL(mat2);
> ? ? double* ret = REAL(retval);
> ? ? double werk = 0.0;
> ? ? for (int j = 0; j < n2; j++) {
> ? ? ? for (int i = 0; i < m1; i++) {
> ? ? ? ? ? werk = 0.0;
> ? ? ? ? for (int k = 0; k < n1; k++) werk += (left[i + m1 * k] * right[k + m2 * j]);
> ? ? ? ? ret[j * m1 + i] = ?werk;
> ? ? ? }
> ? ? }
> ? ? UNPROTECT(1);
> ? ? return retval;
> ? }
>
> Test script :
> m1 <- matrix(rnorm(300000), nrow = 60)
> m2 <- matrix(rnorm(300000), ncol = 60)
> print(microbenchmark::microbenchmark(
> m1 %*% m2, .Call("prod0", m1, m2), times = 100
> ))
>
> Result on my pc:
> Unit: milliseconds
> expr min lq mean median uq max neval
> m1 %*% m2 10.5650 10.8967 11.13434 10.9449 11.02965 15.8397 100
> .Call("prod0", m1, m2) 29.3336 30.7868 32.05114 31.0408 33.85935 45.5321 100
>
>
> Can anyone explain why the compiled code in the package is so much slower than in R core?
>
> and
>
> Is there a way to improve the performance in R package?
>
>
> Best regards,
>
> Luc De Wilde
>
>
>
> ______________________________________________
> R-package-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel