Skip to content

[Rcpp-devel] Differences between RcppEigen and RcppArmadillo

19 messages · Dirk Eddelbuettel, Julian Smith, c s +3 more

#
I've been toying with both RcppArmadillo and RcppEigen the past few days
and don't know which library to continue using. RcppEigen seems really
slick, but appears to be lacking some of the decompositions I want and
isn't nearly as fast to code. RcppArmadillo seems about as fast, easier to
code up etc. What are some of the advantages/disadvantages of both?

Can you call LAPACK or BLAS from either? Is there a wrapper in RcppEigen to
call LAPACK functions? Want some other decomposition methods, dont like the
JacobiSVD method in Eigen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120613/20dab2b9/attachment.html>
#
On 13 June 2012 at 10:57, Julian Smith wrote:
| I've been toying with both RcppArmadillo and RcppEigen the past few days and
| don't know which library to continue using. RcppEigen seems really slick, but
| appears to be lacking some of the decompositions I want and isn't nearly as
| fast to code. RcppArmadillo seems about as fast, easier to code up etc. What
| are some of the advantages/disadvantages of both?

That's pretty close.  I have been a fan of [Rcpp]Armadillo which I find
easier to get my head around.  Doug, however, moved from [Rcpp]Armadillo to
[Rcpp]Eigen as it has some things he needs.  Eigen should have a "larger" API
than Armadillo, but I find the code and docs harder to navigate.

And you should find Eigen to be a little faster. Andreas Alfons went as far
as building 'robustHD' using RcppArmadillo with a drop-in for RcppEigen (in
package 'sparseLTSEigen'; both package names from memmory and I may have
mistyped).  He reported a performance gain of around 25% for his problem
sets.  On the 'fastLm' benchmark, we find the fast Eigen-based decompositions
to be much faster than Armadillo.
 
| Can you call LAPACK or BLAS from either? Is there a wrapper in RcppEigen to
| call LAPACK functions? Want some other decomposition methods, dont like the
| JacobiSVD method in Eigen.

You need to differentiate between the Eigen and Armadillo docs _for their
libraries_ and what happens when you access the Rcpp* variant from R.

RcppArmadillo will use the very same LAPACK and BLAS libs your R session
uses. So MKL, OpenBlas, ... are all options.  Eigen actually has its own code
outperforming LAPACK, so it doesn't  as much there.

Hope this helps,   Dirk (at useR!)

| 
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
I agree that RcppEigen is a little bit faster, but ease of use is important
to me, so I feel like RcppArmadillo might win out in my application.

| RcppArmadillo will use the very same LAPACK and BLAS libs your R session
| uses. So MKL, OpenBlas, ... are all options.  Eigen actually has its own
code
| outperforming LAPACK, so it doesn't  as much there.

Why do you think R outperforms RcppArmadillo in this example below? Anyway
to speed this up?

require(RcppArmadillo)
require(inline)

arma.code <- '
  using namespace arma;
  NumericMatrix Xr(Xs);
  int n = Xr.nrow(), k = Xr.ncol();
  mat X(Xr.begin(), n, k, false);
  mat U;
  vec s;
  mat V;
  svd(U, s, V, X);
  return wrap(s);
'
rcppsvd <- cxxfunction(signature(Xs="numeric"),
                        arma.code,
                        plugin="RcppArmadillo")

A<-matrix(rnorm(5000^2), 5000)
user   system  elapsed
1992.406    4.862 1988.737
user  system elapsed
652.496   2.641 652.614
On Wed, Jun 13, 2012 at 11:43 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120613/f2957506/attachment-0001.html>
#
On 13 June 2012 at 15:05, Julian Smith wrote:
| I agree that RcppEigen is a little bit faster, but ease of use is important to
| me, so I feel like RcppArmadillo might win out in my application.

Yup, that my personal view too.
 
| | RcppArmadillo will use the very same LAPACK and BLAS libs your R session
| | uses. So MKL, OpenBlas, ... are all options. ?Eigen actually has its own
| code
| | outperforming LAPACK, so it doesn't ?as much there.
| 
| Why do you think R outperforms RcppArmadillo in this example below? Anyway to
| speed this up?

That is odd. "I guess it shouldn't." I shall take another look -- as I
understand it both should go to the same underlying Lapack routine.  I may
have to consult with Conrad on this.

Thanks for posting a full and reproducible example!

Dirk
 
| require(RcppArmadillo)
| require(inline)
| 
| arma.code <- '
| ? using namespace arma;
| ? NumericMatrix Xr(Xs);
| ? int n = Xr.nrow(), k = Xr.ncol();
| ? mat X(Xr.begin(), n, k, false);
| ? mat U;
| ? vec s;
| ? mat V;
| ? svd(U, s, V, X);
| ? return wrap(s);
| '
| rcppsvd <- cxxfunction(signature(Xs="numeric"),
| ??????????????????????? arma.code,
| ??????????????????????? plugin="RcppArmadillo")
| 
| A<-matrix(rnorm(5000^2), 5000)
| 
| > system.time(rcppsvd(A))
| ??? user?? system? elapsed
| 1992.406??? 4.862 1988.737
| 
| > system.time(svd(A))
| ?? user? system elapsed
| 652.496?? 2.641 652.614
|
| On Wed, Jun 13, 2012 at 11:43 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| 
|
| On 13 June 2012 at 10:57, Julian Smith wrote:
|     | I've been toying with both RcppArmadillo and RcppEigen the past few days
|     and
|     | don't know which library to continue using. RcppEigen seems really slick,
|     but
|     | appears to be lacking some of the decompositions I want and isn't nearly
|     as
|     | fast to code. RcppArmadillo seems about as fast, easier to code up etc.
|     What
|     | are some of the advantages/disadvantages of both?
| 
|     That's pretty close. ?I have been a fan of [Rcpp]Armadillo which I find
|     easier to get my head around. ?Doug, however, moved from [Rcpp]Armadillo
|     to
|     [Rcpp]Eigen as it has some things he needs. ?Eigen should have a "larger"
|     API
|     than Armadillo, but I find the code and docs harder to navigate.
| 
|     And you should find Eigen to be a little faster. Andreas Alfons went as far
|     as building 'robustHD' using RcppArmadillo with a drop-in for RcppEigen (in
|     package 'sparseLTSEigen'; both package names from memmory and I may have
|     mistyped). ?He reported a performance gain of around 25% for his problem
|     sets. ?On the 'fastLm' benchmark, we find the fast Eigen-based
|     decompositions
|     to be much faster than Armadillo.
| 
|     | Can you call LAPACK or BLAS from either? Is there a wrapper in RcppEigen
|     to
|     | call LAPACK functions? Want some other decomposition methods, dont like
|     the
|     | JacobiSVD method in Eigen.
| 
|     You need to differentiate between the Eigen and Armadillo docs _for their
|     libraries_ and what happens when you access the Rcpp* variant from R.
| 
|     RcppArmadillo will use the very same LAPACK and BLAS libs your R session
|     uses. So MKL, OpenBlas, ... are all options. ?Eigen actually has its own
|     code
|     outperforming LAPACK, so it doesn't ?as much there.
| 
|     Hope this helps, ? Dirk (at useR!)
| 
|     |
|     | ----------------------------------------------------------------------
|     | _______________________________________________
|     | Rcpp-devel mailing list
|     | Rcpp-devel at lists.r-forge.r-project.org
|     | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
|     --
|     Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| 
|
#
On 13 June 2012 at 17:16, Dirk Eddelbuettel wrote:
|
| On 13 June 2012 at 15:05, Julian Smith wrote:
| | I agree that RcppEigen is a little bit faster, but ease of use is important to
| | me, so I feel like RcppArmadillo might win out in my application.
| 
| Yup, that my personal view too.
|  
| | | RcppArmadillo will use the very same LAPACK and BLAS libs your R session
| | | uses. So MKL, OpenBlas, ... are all options. ?Eigen actually has its own
| | code
| | | outperforming LAPACK, so it doesn't ?as much there.
| | 
| | Why do you think R outperforms RcppArmadillo in this example below? Anyway to
| | speed this up?
| 
| That is odd. "I guess it shouldn't." I shall take another look -- as I
| understand it both should go to the same underlying Lapack routine.  I may
| have to consult with Conrad on this.

Does your machine by chance have four cores?  It looks like I get a scaling
in the number of (hyperthreaded) cores on my box:

R> require(RcppArmadillo)
R> require(inline)
R> require(rbenchmark)
R> 
R> arma.code <- 'using namespace arma; NumericMatrix Xr(Xs); int n = Xr.nrow(), k = Xr.ncol(); mat X(Xr.begin(), n, k, false); mat U; vec s; mat V; svd(U, s, V, X); return wrap(s);'
R> rcppsvd <- cxxfunction(signature(Xs="numeric"), body=arma.code,plugin="RcppArmadillo")
R> 
R> N <- 1000
R> A <- matrix(rnorm(N^2), N)
R> 
R> res <- benchmark(rcppsvd(A), svd(A), replications=10)
R> print(res)
        test replications elapsed relative user.self sys.self user.child sys.child
1 rcppsvd(A)           10 121.389  7.77089   127.448    1.748          0         0
2     svd(A)           10  15.621  1.00000    26.290    4.224          0         0
R> 


Why it does this I am not clear about -- may well be a bug.

Dirk
 

| Thanks for posting a full and reproducible example!
| 
| Dirk
|  
| | require(RcppArmadillo)
| | require(inline)
| | 
| | arma.code <- '
| | ? using namespace arma;
| | ? NumericMatrix Xr(Xs);
| | ? int n = Xr.nrow(), k = Xr.ncol();
| | ? mat X(Xr.begin(), n, k, false);
| | ? mat U;
| | ? vec s;
| | ? mat V;
| | ? svd(U, s, V, X);
| | ? return wrap(s);
| | '
| | rcppsvd <- cxxfunction(signature(Xs="numeric"),
| | ??????????????????????? arma.code,
| | ??????????????????????? plugin="RcppArmadillo")
| | 
| | A<-matrix(rnorm(5000^2), 5000)
| | 
| | > system.time(rcppsvd(A))
| | ??? user?? system? elapsed
| | 1992.406??? 4.862 1988.737
| | 
| | > system.time(svd(A))
| | ?? user? system elapsed
| | 652.496?? 2.641 652.614
| |
| | On Wed, Jun 13, 2012 at 11:43 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| | 
| |
| | On 13 June 2012 at 10:57, Julian Smith wrote:
| |     | I've been toying with both RcppArmadillo and RcppEigen the past few days
| |     and
| |     | don't know which library to continue using. RcppEigen seems really slick,
| |     but
| |     | appears to be lacking some of the decompositions I want and isn't nearly
| |     as
| |     | fast to code. RcppArmadillo seems about as fast, easier to code up etc.
| |     What
| |     | are some of the advantages/disadvantages of both?
| | 
| |     That's pretty close. ?I have been a fan of [Rcpp]Armadillo which I find
| |     easier to get my head around. ?Doug, however, moved from [Rcpp]Armadillo
| |     to
| |     [Rcpp]Eigen as it has some things he needs. ?Eigen should have a "larger"
| |     API
| |     than Armadillo, but I find the code and docs harder to navigate.
| | 
| |     And you should find Eigen to be a little faster. Andreas Alfons went as far
| |     as building 'robustHD' using RcppArmadillo with a drop-in for RcppEigen (in
| |     package 'sparseLTSEigen'; both package names from memmory and I may have
| |     mistyped). ?He reported a performance gain of around 25% for his problem
| |     sets. ?On the 'fastLm' benchmark, we find the fast Eigen-based
| |     decompositions
| |     to be much faster than Armadillo.
| | 
| |     | Can you call LAPACK or BLAS from either? Is there a wrapper in RcppEigen
| |     to
| |     | call LAPACK functions? Want some other decomposition methods, dont like
| |     the
| |     | JacobiSVD method in Eigen.
| | 
| |     You need to differentiate between the Eigen and Armadillo docs _for their
| |     libraries_ and what happens when you access the Rcpp* variant from R.
| | 
| |     RcppArmadillo will use the very same LAPACK and BLAS libs your R session
| |     uses. So MKL, OpenBlas, ... are all options. ?Eigen actually has its own
| |     code
| |     outperforming LAPACK, so it doesn't ?as much there.
| | 
| |     Hope this helps, ? Dirk (at useR!)
| | 
| |     |
| |     | ----------------------------------------------------------------------
| |     | _______________________________________________
| |     | Rcpp-devel mailing list
| |     | Rcpp-devel at lists.r-forge.r-project.org
| |     | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| |     --
| |     Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| | 
| | 
| 
| -- 
| Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
#
On Wed, Jun 13, 2012 at 5:16 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
Because the arma code is evaluating the singular vectors (U and V) as
well as the singular values (S) whereas the R code is only evaluating
the singular values.  There is considerably more effort required to
evaluate the singular vectors in addition to the singular values.
#
Doesn't svd in R by default compute D, U and V?

http://stat.ethz.ch/R-manual/R-patched/library/base/html/svd.html
On Wed, Jun 13, 2012 at 4:07 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120613/9801bf54/attachment-0001.html>
#
On Wed, Jun 13, 2012 at 6:53 PM, Julian Smith <hilbertspaces at gmail.com> wrote:
You're right but the default is the 'thin' U when X is n by p and n >=
p.  Does the svd in Armadillo return the full n by n matrix U?

Another thing to check is what underlying Lapack routine is called.
There are two: dgesvd, which is the older algorithm and the one with
the expected name if you know the Lapack conventions, and dgesdd which
is a newer and faster algorithm.  R uses dgesdd.
#
On 14 June 2012 at 08:01, Douglas Bates wrote:
| On Wed, Jun 13, 2012 at 6:53 PM, Julian Smith <hilbertspaces at gmail.com> wrote:
| > Doesn't svd in R by default compute D, U and V?
| 
| > http://stat.ethz.ch/R-manual/R-patched/library/base/html/svd.html
| 
| You're right but the default is the 'thin' U when X is n by p and n >=
| p.  Does the svd in Armadillo return the full n by n matrix U?

Thanks for that earlier hint re 'thin' and 'full' SVDs.
 
| Another thing to check is what underlying Lapack routine is called.
| There are two: dgesvd, which is the older algorithm and the one with
| the expected name if you know the Lapack conventions, and dgesdd which
| is a newer and faster algorithm.  R uses dgesdd.

Good call. 

It looks like Armadillo uses dgesv(d), grep did not find dgesdd it seems: :

edd at max:~/svn/rcpp/pkg/RcppArmadillo/inst/include/armadillo_bits$ grep dges *
atlas_bones.hpp:    int wrapper_clapack_dgesv(const enum CBLAS_ORDER Order, const int N, const int NRHS, double *A, const int lda, int *ipiv, double *B, const int ldb);
atlas_wrapper.hpp:      return arma_atlas(clapack_dgesv)(Order, N, NRHS, (T*)A, lda, ipiv, (T*)B, ldb);
glue_histc_meat.hpp:    "histc(): parameter 'edges' must be a vector"
lapack_bones.hpp:  #define arma_dgesvd dgesvd
lapack_bones.hpp:  #define arma_dgesv  dgesv
lapack_bones.hpp:  #define arma_dgesvd DGESVD
lapack_bones.hpp:  #define arma_dgesv  DGESV
lapack_bones.hpp:  void arma_fortran(arma_dgesvd)(char* jobu, char* jobvt, blas_int* m, blas_int* n, double* a, blas_int* lda, double* s, double* u, blas_int* ldu, double* vt, blas_int* ldvt, double* work, blas_int* lwork, blas_int* info);
lapack_bones.hpp:  void arma_fortran(arma_dgesv)(blas_int* n, blas_int* nrhs, double* a, blas_int* lda, blas_int* ipiv, double* b, blas_int* ldb, blas_int* info);
lapack_wrapper.hpp:      arma_fortran(arma_dgesvd)(jobu, jobvt, m, n, (T*)a, lda, (T*)s, (T*)u, ldu, (T*)vt, ldvt, (T*)work, lwork, info);
lapack_wrapper.hpp:      arma_fortran(arma_dgesv)(n, nrhs, (T*)a, lda, ipiv, (T*)b, ldb, info);
edd at max:~/svn/rcpp/pkg/RcppArmadillo/inst/include/armadillo_bits$ 

Conrad, any interest in switching to dgesdd? 

Dirk, at useR and across the room from Doug
| > On Wed, Jun 13, 2012 at 4:07 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
| >>
| >> On Wed, Jun 13, 2012 at 5:16 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| >> >
| >> > On 13 June 2012 at 15:05, Julian Smith wrote:
| >> > | I agree that RcppEigen is a little bit faster, but ease of use is
| >> > important to
| >> > | me, so I feel like RcppArmadillo might win out in my application.
| >> >
| >> > Yup, that my personal view too.
| >> >
| >> > | | RcppArmadillo will use the very same LAPACK and BLAS libs your R
| >> > session
| >> > | | uses. So MKL, OpenBlas, ... are all options. ?Eigen actually has its
| >> > own
| >> > | code
| >> > | | outperforming LAPACK, so it doesn't ?as much there.
| >> > |
| >> > | Why do you think R outperforms RcppArmadillo in this example below?
| >> > Anyway to
| >> > | speed this up?
| >> >
| >> > That is odd. "I guess it shouldn't." I shall take another look -- as I
| >> > understand it both should go to the same underlying Lapack routine. ?I
| >> > may
| >> > have to consult with Conrad on this.
| >> >
| >> > Thanks for posting a full and reproducible example!
| >> >
| >> > Dirk
| >> >
| >> > | require(RcppArmadillo)
| >> > | require(inline)
| >> > |
| >> > | arma.code <- '
| >> > | ? using namespace arma;
| >> > | ? NumericMatrix Xr(Xs);
| >> > | ? int n = Xr.nrow(), k = Xr.ncol();
| >> > | ? mat X(Xr.begin(), n, k, false);
| >> > | ? mat U;
| >> > | ? vec s;
| >> > | ? mat V;
| >> > | ? svd(U, s, V, X);
| >> > | ? return wrap(s);
| >> > | '
| >>
| >> Because the arma code is evaluating the singular vectors (U and V) as
| >> well as the singular values (S) whereas the R code is only evaluating
| >> the singular values. ?There is considerably more effort required to
| >> evaluate the singular vectors in addition to the singular values.
| >>
| >> > | rcppsvd <- cxxfunction(signature(Xs="numeric"),
| >> > | ??????????????????????? arma.code,
| >> > | ??????????????????????? plugin="RcppArmadillo")
| >> > |
| >> > | A<-matrix(rnorm(5000^2), 5000)
| >> > |
| >> > | > system.time(rcppsvd(A))
| >> > | ??? user?? system? elapsed
| >> > | 1992.406??? 4.862 1988.737
| >> > |
| >> > | > system.time(svd(A))
| >> > | ?? user? system elapsed
| >> > | 652.496?? 2.641 652.614
| >> > |
| >> > | On Wed, Jun 13, 2012 at 11:43 AM, Dirk Eddelbuettel <edd at debian.org>
| >> > wrote:
| >> > |
| >> > |
| >> > | ? ? On 13 June 2012 at 10:57, Julian Smith wrote:
| >> > | ? ? | I've been toying with both RcppArmadillo and RcppEigen the past
| >> > few days
| >> > | ? ? and
| >> > | ? ? | don't know which library to continue using. RcppEigen seems
| >> > really slick,
| >> > | ? ? but
| >> > | ? ? | appears to be lacking some of the decompositions I want and
| >> > isn't nearly
| >> > | ? ? as
| >> > | ? ? | fast to code. RcppArmadillo seems about as fast, easier to code
| >> > up etc.
| >> > | ? ? What
| >> > | ? ? | are some of the advantages/disadvantages of both?
| >> > |
| >> > | ? ? That's pretty close. ?I have been a fan of [Rcpp]Armadillo which I
| >> > find
| >> > | ? ? easier to get my head around. ?Doug, however, moved from
| >> > [Rcpp]Armadillo
| >> > | ? ? to
| >> > | ? ? [Rcpp]Eigen as it has some things he needs. ?Eigen should have a
| >> > "larger"
| >> > | ? ? API
| >> > | ? ? than Armadillo, but I find the code and docs harder to navigate.
| >> > |
| >> > | ? ? And you should find Eigen to be a little faster. Andreas Alfons
| >> > went as far
| >> > | ? ? as building 'robustHD' using RcppArmadillo with a drop-in for
| >> > RcppEigen (in
| >> > | ? ? package 'sparseLTSEigen'; both package names from memmory and I
| >> > may have
| >> > | ? ? mistyped). ?He reported a performance gain of around 25% for his
| >> > problem
| >> > | ? ? sets. ?On the 'fastLm' benchmark, we find the fast Eigen-based
| >> > | ? ? decompositions
| >> > | ? ? to be much faster than Armadillo.
| >> > |
| >> > | ? ? | Can you call LAPACK or BLAS from either? Is there a wrapper in
| >> > RcppEigen
| >> > | ? ? to
| >> > | ? ? | call LAPACK functions? Want some other decomposition methods,
| >> > dont like
| >> > | ? ? the
| >> > | ? ? | JacobiSVD method in Eigen.
| >> > |
| >> > | ? ? You need to differentiate between the Eigen and Armadillo docs
| >> > _for their
| >> > | ? ? libraries_ and what happens when you access the Rcpp* variant from
| >> > R.
| >> > |
| >> > | ? ? RcppArmadillo will use the very same LAPACK and BLAS libs your R
| >> > session
| >> > | ? ? uses. So MKL, OpenBlas, ... are all options. ?Eigen actually has
| >> > its own
| >> > | ? ? code
| >> > | ? ? outperforming LAPACK, so it doesn't ?as much there.
| >> > |
| >> > | ? ? Hope this helps, ? Dirk (at useR!)
| >> > |
| >> > | ? ? |
| >> > | ? ? |
| >> > ----------------------------------------------------------------------
| >> > | ? ? | _______________________________________________
| >> > | ? ? | Rcpp-devel mailing list
| >> > | ? ? | Rcpp-devel at lists.r-forge.r-project.org
| >> > | ? ? |
| >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| >> > | ? ? --
| >> > | ? ? Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| >> > |
| >> > |
| >> >
| >> > --
| >> > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| >> > _______________________________________________
| >> > Rcpp-devel mailing list
| >> > Rcpp-devel at lists.r-forge.r-project.org
| >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| >
| >
c s
#
On Thu, Jun 14, 2012 at 4:43 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
This is a mis-conception that needs to be addressed.  For equivalent
functionality, Armadillo is not necessarily any slower than Eigen,
given suitable Lapack and/or Blas libraries are used (such as Intel's
MKL or AMD's ACML, or even the open-source Atlas or OpenBlas in many
cases).  Standard Lapack and Blas are just that: a "better than
nothing" baseline implementation in terms of performance.

Armadillo doesn't reimplement Lapack and it doesn't reimplement any
decompositions -- it uses Lapack.  (**This is a very important point,
which I elaborate on below**).  As such, the speed of Armadillo for
matrix decompositions is directly dependant on the particular
implementation of Lapack that's installed on the user's machine.

I've seen some ridiculous speed differences between standard Lapack
and MKL.  The latter not only has CPU-specific optimisations (eg.
using the latest AVX extensions), but can also do multi-threading.

Simply installing ATLAS (which provides speed-ups for several Lapack
functions) on Debian/Ubuntu systems can already make a big difference.
 (Debian & Ubuntu use a trick to redirect Lapack and Blas calls to
ATLAS).  Under Mac OS X, the Accelerate framework provides fast
implementations of Lapack and Blas functions (eg. using
multi-threading).

I've taken the modular approach to Armadillo (ie. using Lapack rather
than reimplementing decompositions), as it specifically allows other
specialist parties (such as Intel) to provide Lapack that is highly
optimised for particular architectures.  I myself would not be able to
keep up with the specific optimisations required for each CPU.  This
also "future-proofs" Armadillo for each new CPU generation.

More importantly, numerically stable implementation of computational
decompositions/factorisations is notoriously difficult to get right.
The core algorithms in Lapack have been evolving for the past 20+
years, being exposed to a bazillion corner-cases.  Lapack itself is
related to Linpack and Eispack, which are even older.  I've been
exposed to software development long enough to know that in the end
only time can shake out all the bugs.  As such, using Lapack is far
less risky than reimplementing decompositions from scratch.  A
"home-made" matrix decomposition might be a bit faster on a particular
CPU, but you have far less knowledge as to when it's going to blow up
in your face.

High-performance variants of Lapack, such as MKL, take an existing
proven implementation of a decomposition algorithm and recode parts of
it in assembly, and/or parallelise other parts.
c s
#
On Jun 15, 2012 12:11 AM, "Dirk Eddelbuettel" wrote:
Armadillo has the standard svd() and the thin version too: svd_econ().
See http://arma.sourceforge.net/docs.html#svd_econ
Yes, but this is not simply a matter of making svd() use dgesdd()
instead of dgesvd().   Lapack authors added the dgsedd() function
instead of simply replacing the underlying algorithm used by dgesvd().
 This was done for a reason: the results produced by dgesvd() and
dgesdd() can be "slightly" different (where "slightly" is dependant on
matrix size and matrix content).  If Armadillo starts giving different
results all of a sudden, there would be a lot of displeased people.

I'll probably add an option to Armadillo's svd() to optionally use
dgesdd().  I've done something very similar for eig_sym() in Armadillo
3.2, where the faster "divide and conquer" algorithm for eigen
decomposition can be optionally used. See
http://arma.sourceforge.net/docs.html#eig_sym
#
On 15 June 2012 at 02:56, c s wrote:
| Simply installing ATLAS (which provides speed-ups for several Lapack
| functions) on Debian/Ubuntu systems can already make a big difference.
|  (Debian & Ubuntu use a trick to redirect Lapack and Blas calls to
| ATLAS).  Under Mac OS X, the Accelerate framework provides fast
| implementations of Lapack and Blas functions (eg. using
| multi-threading).

I found OpenBLAS to be faster than Atlas (with both coming from the
distribution), and tend to install that now. It uses a multithreaded approach
by default, but I have to double check one thing about cpu affinity which,
when used from R, has been seen to interfere.  That is possible reason for
the single-threaded performance here. I'll report back.

| I've taken the modular approach to Armadillo (ie. using Lapack rather
| than reimplementing decompositions), as it specifically allows other
| specialist parties (such as Intel) to provide Lapack that is highly
| optimised for particular architectures.  I myself would not be able to
| keep up with the specific optimisations required for each CPU.  This
| also "future-proofs" Armadillo for each new CPU generation.
| 
| More importantly, numerically stable implementation of computational
| decompositions/factorisations is notoriously difficult to get right.
| The core algorithms in Lapack have been evolving for the past 20+
| years, being exposed to a bazillion corner-cases.  Lapack itself is
| related to Linpack and Eispack, which are even older.  I've been
| exposed to software development long enough to know that in the end
| only time can shake out all the bugs.  As such, using Lapack is far
| less risky than reimplementing decompositions from scratch.  A
| "home-made" matrix decomposition might be a bit faster on a particular
| CPU, but you have far less knowledge as to when it's going to blow up
| in your face.
| 
| High-performance variants of Lapack, such as MKL, take an existing
| proven implementation of a decomposition algorithm and recode parts of
| it in assembly, and/or parallelise other parts.

All excellent points which nobody disputes.  It just so happens that Eigen
still looks better in some benchmarks but as you say we have to ensure we
really compare apples to apples.

Dirk
1 day later
#
On Thu, Jun 14, 2012 at 2:36 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
While I realize that Conrad has strong opinions on these issues and is
unlikely to be moved from those opinions, I would make a few points in
favor of Eigen.

- using BLAS/Lapack is effective because that is mature, well-tested
code.  It is ineffective because the reference implementations of
BLAS/Lapack are in Fortran 77 and suffer from all the idiosyncrasies
of that ancient language.  These include call-by-reference semantics,
even for scalars, lack of structures beyond pointers to arrays and the
inability to allocate storage.  Hence, the caller frequently must pass
an array 'work' and it length 'lwork' and, occasionally, 'rwork' and
'iwork'. There is a helpful convention for getting the length of the
work array correct which involves a preliminary call with lwork = -1
that does nothing but write the desired work array length into the
first element of the work array.  The caller must then allocate the
work array, change the value of lwork and call the routine a second
time.  Because the only structure available is a pointer to a
one-dimensional array, you must pass a general matrix as four
arguments M, N, A, LDA.  There are no machine-readable header files.
Jeff Bezanson recently joked about writing a parser for the upper-case
Fortran comments to create headers.  Does any of this sound like 21st
century software design?

- MKL blas is very good on Intel hardware.  Naturally Intel doesn't go
to great lengths to support AMD processors to the same extent.  MKL is
proprietary, closed-source software and plugging it into an
open-source software systems like R and Aramadillo feels like a
compromise.  There are open alternatives like Atlas and OpenBlas.
However, these all need to conform to the Fortran calling sequence
specifications (Lapacke and CBLAS do provide C calling sequences but
it is not a good idea to count on their availability at present).  And
notice that this approach contradicts the "Lapack code is complex but
widely tested so it should be left untouched" argument.  The way to
get good performance in OpenBlas and MKL is to rewrite the BLAS and
heavily-used parts of Lapack, like dgetrf, in a combination of C and
Assembler.  I would also point out that one of the reasons that Lapack
code is complex and difficult to understand is because it is written
in the archaic language Fortran 77.

- the approach in Eigen was to use templated C++ classes for dense and
sparse matrices and for the decompositions and solvers.  I don't think
readers of this group need to be reminded of the advantages of
templated object-oriented C++ code.  Eigen has a bit of a learning
curve but it also has a great range of capabilities.

These comments may provoke a heated response from Conrad but, if so, I
don't plan to respond further.  Eigen and Armadillo are different
approaches, each with their own advantages and disadvantages.
1 day later
c s
#
On Sun, Jun 17, 2012 at 4:21 AM, Douglas Bates wrote:
I'm actually in favour of heterogeneous solutions and support the use
of C++ for performance critical implementations.

Relying on only one solution (ie. homogeneity) is bad from a
"survival" point of view, due to very similar reasons as in the
biological world: a fault can affect the entire population, or a virus
can propagate itself quite easily and do lots of nasty things.  For
example, see the recent snafu with state-sponsored viruses attacking
Windoze installations, ie, Stuxnet and Flame:
http://www.h-online.com/security/features/FAQ-Flame-the-super-spy-1587063.html

In my view, the C++ brethren Armadillo and Eigen are not really up
against each other, but against Matlab.  Matlab is like crack or
crystal meth.  Highly addictive, but in the end bad for you.
Mathworks, the company that produces Matlab, hooks people in while
they are university students.  It provides low prices to educational
institutions, and charges through the roof for everyone else.
Students become graduates, which are then employed by the companies
that are in effect stuck with using Matlab.  This is a strategy known
as vendor lock-in, exploited by monopolists.  Other descriptions
consistent with the business model of Mathworks are "parasite" and
"leech".

One way of keeping Mathworks in check is to provide alternative
solutions that have similar functionality (eg. easy visualisation of
data and rapid prototyping).  One of them in GNU Octave, which really
needs a JIT compiler to effectively compete with Matlab..  Another is
R.  I believe R currently doesn't have a JIT compiler (I haven't
checked lately), and hence the very useful Rcpp fills in the
performance gap.

One also has to address the current reality that people still overly
rely on Matlab, and have the problem of converting their Matlab code
into production environments / products.  This is where C++ comes in,
and libraries such as Armadillo (with its Matlab-like syntax) can be
quite handy.
#
On Sun, Jun 17, 2012 at 11:09 PM, c s <conradsand.arma at gmail.com> wrote:
I completely and happily agree.
The recently-developed Julia language (JuliaLang.org) has a JIT and a
syntax that is familiar to Matlab and R users.  It is still very much
a work in progress but shows great promise.
#
On Sun, Jun 17, 2012 at 9:09 PM, c s <conradsand.arma at gmail.com> wrote:

            
Still work in progress but  see http://www.milbo.users.sonic.net/ra/jit.html.
Doesn't replace Rcpp by any stretch of the imagination, but may close the
gap a little bit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120618/7e2ca815/attachment.html>
#
On 18 June 2012 at 11:55, Antonio Piccolboni wrote:
| 
|
| On Sun, Jun 17, 2012 at 9:09 PM, c s <conradsand.arma at gmail.com> wrote:
| 
|     ?Another is
|     R. ?I believe R currently doesn't have a JIT compiler (I haven't
|     checked lately), and hence the very useful Rcpp fills in the
|     performance gap.
| 
| 
| 
| Still work in progress but ?see?http://www.milbo.users.sonic.net/ra/jit.html.
| Doesn't replace Rcpp by any stretch of the imagination, but may close the gap a
| little bit.

Sadly, Ra and jit are no longer maintained by Stephen Milborrow. 

I actually used to provided a patched binary for this within Debian. This was
ahead of its time (ginve the renewed interest in JITs from LLVM and other
places) and quite useful, but Ra/jit is at this point a dead project.

Dirk
#
Ooops, wrong JIT! Dirk is absolutely right. I meant to say see enableJIT is
the compiler package, see
http://www.r-statistics.com/2012/04/speed-up-your-r-code-using-a-just-in-time-jit-compiler/

I second Dirk's comments on Ra.


Antonio
On Mon, Jun 18, 2012 at 12:53 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20120618/78b96b23/attachment.html>
#
As I understand it, both Ra and the R 'compiler' package are bytecode 
JITs, so roughly like Java '.class' and Emacs '.elc' files. There is no 
current method that compiles R into machine code, which is what JIT in 
the JVM does, and which is why Java does so well in the language 
shootout [1].

Davor

[1] http://shootout.alioth.debian.org/
On 12-06-18 01:15 PM, Antonio Piccolboni wrote: