Skip to content

[Rcpp-devel] [rcpp-devel] Rcpp Gallery Example fastLm vs R native lm

6 messages · Hadley Wickham, Smith, Dale (Norcross), Dirk Eddelbuettel

#
I have a question about the fastLm example in the Gallery http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/. I put the code directly into my package (after renaming it fastLmProto so I don't mask the RcppArmadillo function by the same name). After building the package, I wanted to compare results:
(Intercept)          x1 

  3.0000909   0.5000909
[,1]
[1,] 0.7968032
[1] 1.208169

 

Should I expect the results to match? Why do fastLmProto and fastLm produce a single fitted parameter (I would expect two)? Why are they different? Am I doing something wrong here, or just being na?ve in my assumptions?

 

I reviewed the RcppArmadillo documentation and the article http://dirk.eddelbuettel.com/papers/RcppArmadillo.pdf but could not find anything relevant.

 

Thanks,

Dale Smith, Ph.D.

Senior Financial Quantitative Analyst

Risk & Compliance

Fiserv

Office: 678-375-5315

www.fiserv.com <http://www.fiserv.com/> 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130321/345c8b45/attachment.html>
#
On Thu, Mar 21, 2013 at 10:44 AM, Smith, Dale <Dale.Smith at fiserv.com> wrote:
Hint:
x1
0.7968032

Hadley
#
Thanks. Should have noticed this myself.

Dale Smith, Ph.D.
Senior Financial Quantitative Analyst
Risk & Compliance
Fiserv
Office: 678-375-5315
www.fiserv.com


-----Original Message-----
From: Hadley Wickham [mailto:h.wickham at gmail.com] 
Sent: Thursday, March 21, 2013 11:55 AM
To: Smith, Dale
Cc: rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] [rcpp-devel] Rcpp Gallery Example fastLm vs R native lm
On Thu, Mar 21, 2013 at 10:44 AM, Smith, Dale <Dale.Smith at fiserv.com> wrote:
Hint:
x1
0.7968032

Hadley

--
Chief Scientist, RStudio
http://had.co.nz/
#
On 21 March 2013 at 10:55, Hadley Wickham wrote:
| > Should I expect the results to match? Why do fastLmProto and fastLm produce
| > a single fitted parameter (I would expect two)? Why are they different? Am I
| > doing something wrong here, or just being na?ve in my assumptions?
| 
| Hint:
| 
| > coef(lm(y1 ~ x1 - 1, data = anscombe))
|        x1
| 0.7968032

Also the Gallery article may not be the most exhaustive reference -- are you
aware that the packages

    RcppArmadillo

    RcppEigen

    RcppGSL

all carry fastLm implementations with and with formula interface?  Some of
these also have timing benchmark examples.

Another hint:  If you care about speed, do NOT use the formula interface.

I have factored out the Arma version (from RcppArmadillo/src/fastLm.cpp and
the related R file) a few times.

Hope this help,  Dirk
#
At this point, I'm not interested in blazing speed, but in getting to know Armadillo itself.

Based on the thread

http://thread.gmane.org/gmane.comp.lang.r.rcpp/3522

I'm not completely convinced that Eigen is faster than Armadillo on all problems. I did find some additional benchmarks, which are not perfect, but do lead me to the same conclusion.

http://nghiaho.com/?p=954

Some of you may find this document useful.

http://verdandi.sourceforge.net/doc/linear_algebra_libraries.pdf

Dale Smith, Ph.D.
Senior Financial Quantitative Analyst
Risk & Compliance
Fiserv
Office: 678-375-5315
www.fiserv.com


-----Original Message-----
From: Dirk Eddelbuettel [mailto:edd at debian.org] 
Sent: Thursday, March 21, 2013 12:52 PM
To: Hadley Wickham
Cc: Smith, Dale; rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] [rcpp-devel] Rcpp Gallery Example fastLm vs Rnative lm
On 21 March 2013 at 10:55, Hadley Wickham wrote:
| > Should I expect the results to match? Why do fastLmProto and fastLm 
| > produce a single fitted parameter (I would expect two)? Why are they 
| > different? Am I doing something wrong here, or just being na?ve in my assumptions?
| 
| Hint:
| 
| > coef(lm(y1 ~ x1 - 1, data = anscombe))
|        x1
| 0.7968032

Also the Gallery article may not be the most exhaustive reference -- are you aware that the packages

    RcppArmadillo

    RcppEigen

    RcppGSL

all carry fastLm implementations with and with formula interface?  Some of these also have timing benchmark examples.

Another hint:  If you care about speed, do NOT use the formula interface.

I have factored out the Arma version (from RcppArmadillo/src/fastLm.cpp and the related R file) a few times.

Hope this help,  Dirk

--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
#
On 21 March 2013 at 12:46, Smith, Dale wrote:
| At this point, I'm not interested in blazing speed, but in getting to know Armadillo itself.

Right. [ I just mentioned in the case where speed matters, the cost of setting
up the model.matrix dwarfs all differences between the linear algebra
setups.  So even the various fastLm() methods can be slower than R's lm.fit()
if the former starts from "y ~ X" and the latter just gets lm.fit(X,y). ]

| Based on the thread
| 
| http://thread.gmane.org/gmane.comp.lang.r.rcpp/3522

That is a good reminder of how testy these exchanges can become.  And as
Conrad states really well in the thread, it is NOT about armadillo vs eigen,
but rather either one or both (as well as related libraries) against the very
closed and proprietary system that shall remain nameless.
 
| I'm not completely convinced that Eigen is faster than Armadillo on all problems. I did find some additional benchmarks, which are not perfect, but do lead me to the same conclusion.
| 
| http://nghiaho.com/?p=954

That's nicely done, thanks for sharing.  For what it is worth, I have also go
timing comparison go both ways but I have not done anything exhaustive.
 
| Some of you may find this document useful.
| 
| http://verdandi.sourceforge.net/doc/linear_algebra_libraries.pdf

That may be a tad dated. Four years is a long time in this space which
everything that has happened.

Dirk