Python and R
On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian <esmail.js at gmail.com> wrote:
Gabor Grothendieck wrote:
See ?Rprof for profiling your R code. If lm is the culprit, rewriting your lm calls using lm.fit might help.
Yes, based on my informal benchmarking, lm is the main "bottleneck", the rest of the code consists mostly of vector manipulations and control structures. I am not familiar with lm.fit, I'll definitely look it up. I hope it's similar enough to make it easy to substitute one for the other. Thanks for the suggestion, much appreciated. (My runs now take sometimes several hours, it would be great to cut that time down by any amount :-)
Yes, the speedup can be significant. e.g. here we cut the time down to 40% of the lm time by using lm.fit and we can get down to nearly 10% if we go even lower level:
system.time(replicate(1000, lm(DAX ~.-1, EuStockMarkets)))
user system elapsed 26.85 0.07 27.35
system.time(replicate(1000, lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])))
user system elapsed 10.76 0.00 10.78
system.time(replicate(1000, qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1])))
user system elapsed 3.33 0.00 3.34
lm(DAX ~.-1, EuStockMarkets)
Call:
lm(formula = DAX ~ . - 1, data = EuStockMarkets)
Coefficients:
SMI CAC FTSE
0.55156 0.45062 -0.09392
# They call give the same coefficients:
lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])$coef
SMI CAC FTSE 0.55156141 0.45062183 -0.09391815
qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1])
SMI CAC FTSE 0.55156141 0.45062183 -0.09391815