[Rcpp-devel] examples of using cula matrix multiplication in Rcpp

Dear List,

I wonder if anyone worked on incorporating CULA tools library functionality into Rcpp. How much speed gain on top of Rcpp do we expect on basic operation like matrix multiplication?

In particular, I?m  currently usnig RArmadillo to seamlessly perform matrix multiplication. But the speed gain over my R implementation is 5-10 times if not less. 

I?m wondering if there is an equivalent easy-to-use library for doing matrix multiplication with GPU enabled. A complete simple example would be greatly appreciated.

Thanks in advance,
Yue