Skip to content
Back to formatted view

Raw Message

Message-ID: <AM0PR09MB3843CA93CE9D6D99925D188B98312@AM0PR09MB3843.eurprd09.prod.outlook.com>
Date: 2024-12-06T08:39:08Z
From: Luc De Wilde
Subject: [R-pkg-devel]  Cannot create C code with acceptable performance with respect to internal R command.
In-Reply-To: <60433DA0-BD30-48E8-B189-C4651A5372DB@gmail.com>

Thanks to all help, I finally got two (!) solutions for my problem :?


Unit: milliseconds
                                        expr     min       lq     mean   median       uq     max neval
                                   m1 %*% m2 11.2685 11.48595 11.83029 11.60745 11.83170 17.2381   200
 .Call("prod0", m1, m2, PACKAGE = "ldwTest") 10.8301 11.03360 11.43360 11.18950 11.36395 24.4530   200
 .Call("prod2", m1, m2, PACKAGE = "ldwTest") 10.7453 10.96310 11.29727 11.09395 11.31465 17.3467   200

m& %*% m2 : R matrix product

prod0 : the BLAS fortran GEMM routine rewritten in C++ (there was an important rearrangement of the for loops to improve cache use)

prod1 : call, in C++, of the BLAS fortran GEMM routine



Luc

________________________________________
Van:?Avraham Adler <avraham.adler at gmail.com>
Verzonden:?vrijdag 6 december 2024 8:46
Aan:?Luc De Wilde <Luc.DeWilde at UGent.be>
CC:?Dirk Eddelbuettel <edd at debian.org>; Yves Rosseel <Yves.Rosseel at UGent.be>; r-package-devel at r-project.org <r-package-devel at r-project.org>
Onderwerp:?Re: [R-pkg-devel] Cannot create C code with acceptable performance with respect to internal R command.
?
For future reference and completeness, since I responded off list, I simply pointed out to Luke an example of using R?s BLAS interface with DGEMV. He needs DGEMM, but the idea is the same.?

<?https://github.com/aadler/minimaxApprox/blob/master/src/Chebyshev.c>

Avi

Sent from my iPhone

On Dec 6, 2024, at 12:14?AM, Luc De Wilde <Luc.DeWilde at ugent.be> wrote:

?Dirk,

that's indeed an easy way to go, but I'm searching for methods that doesn't need to add other dependencies in my package, so the answer of Avraham is the most relevant for me.

But off course, thank you for your help!

Luc

________________________________
Van: Dirk Eddelbuettel <edd at debian.org>
Verzonden: donderdag 5 december 2024 15:09
Aan: Luc De Wilde <Luc.DeWilde at UGent.be>
CC: Tomas Kalibera <tomas.kalibera at gmail.com>; r-package-devel at r-project.org <r-package-devel at r-project.org>; Yves Rosseel <Yves.Rosseel at UGent.be>
Onderwerp: Re: [R-pkg-devel] Cannot create C code with acceptable performance with respect to internal R command.


Luc,

As Tomas mentioned, matrix-multiplication can take advantage of multiple
threads, and the 'text book' nexted loops do not do that. ?Now, one
alternative that appeals a lot to me is to farm out to Armadillo which also
calls LAPACK for you (as R does). And via RcppArmadillo, the setup becomes a
one-liner with the expression 'mat1 * mat2' where '*' is overloaded
appropriately (as is matrix multiplication '%*%' in R). ?I include your
example as self-contained and reproducible script below, on my not-so-recent
machine with twelve cores I get

$ Rscript luc.r
Unit: microseconds
expr ??????min ???????lq ????mean ???median ??????uq ?????max neval cld
???C 29010.538 39242.004 47948.98 50930.500 52715.30 81668.53 ??100 ?a
???R ??685.658 ??800.653 ?1984.17 ?1129.754 ?2719.88 ?8420.66 ??100 ??b
?Cpp ??401.182 ??444.164 ?1775.03 ??651.023 ?1656.24 30369.15 ??100 ??b
$

but what really shines (in my eyes) is that a function

???arma::mat cppprod(const arma::mat& m1, const arma::mat& m2) {
???????return m1 * m2;
???}

gets set-up for you with no worries whatsoever and outscores the R
version. (And if you look into the Rcpp docs you can learn to make this a
little faster still but skipping a (generally recommended !!) handshake with
RNG status etc).

But different strokes for different folks, not everybody likes C++ (which is
both perfectly find and also includes Tomas who saw fit to rail against it
yesterday regarding its compile times which can both tweaked and are also
worse still in some other popular languages) but I digress ...

Hope this helps, Dirk


ccode <- r"(
SEXP u1 = Rf_getAttrib(mat1, R_DimSymbol);
int m1 = INTEGER(u1)[0];
int n1 = INTEGER(u1)[1];
SEXP u2 = Rf_getAttrib(mat2, R_DimSymbol);
int m2 = INTEGER(u2)[0];
int n2 = INTEGER(u2)[1];
if (n1 != m2) Rf_error("matrices not conforming");
SEXP retval = PROTECT(Rf_allocMatrix(REALSXP, m1, n2));
double* left = REAL(mat1);
double* right = REAL(mat2);
double* ret = REAL(retval);
double werk = 0.0;
for (int j = 0; j < n2; j++) {
?for (int i = 0; i < m1; i++) {
????werk = 0.0;
????for (int k = 0; k < n1; k++)
??????werk += (left[i + m1 * k] * right[k + m2 * j]);
????ret[j * m1 + i] = werk;
?}
}
UNPROTECT(1);
return retval;
)"
cprod <- inline::cfunction(sig=signature(mat1="numeric", mat2="numeric"), body=ccode, language="C")

Rcpp::cppFunction("arma::mat cppprod(const arma::mat& m1, const arma::mat& m2) { return m1 * m2; }", depends="RcppArmadillo")

set.seed(123)
m1 <- matrix(rnorm(300000), nrow = 60)
m2 <- matrix(rnorm(300000), ncol = 60)
print(microbenchmark::microbenchmark(C = cprod(m1, m2),
????????????????????????????????????R = m1 %*% m2,
????????????????????????????????????Cpp = cppprod(m1, m2),
????????????????????????????????????times = 100))

--
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org

? ?[[alternative HTML version deleted]]

______________________________________________
R-package-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel