R badly lags matlab on performance?

Luke Tierney · 2009-01-04T21:50:40Z

On Sun, 4 Jan 2009, Stavros Macrakis wrote: > On Sat, Jan 3, 2009 at 7:02 PM, wrote: >> R's interpreter is fairly slow due in large part to the allocation of >> argument lists and the cost of lookups of variables, including ones >> like [ > Wow, I had no idea the interpreter was so awful. Just some simple > tree-to-tree transformations would speed things up, I'd think, e.g. > ` `<-[`(.

Luke Tierney

Sun, Jan 4, 2009 1:50 PM

On Sun, 4 Jan 2009, Stavros Macrakis wrote:

'Awful' seems a bit strong.  It's also a bit more complicated in that
one needs both [ and [<- in complex assignment expressions, but the
point that one could rewrite assignments into something that can be
more efficiently executed is certainly true. There are also a number
of opportunities to do things like this.  They do have repercussions
though -- in this case one would either need to modify code that needs
to lok at the original code to undo the operatin, or add a new data
structure that contains the original code object and the rewritten
one, and deal with implications for serialization, and so on. Doable
of course, and worth doing if the payoff is high enough, but I'm not
convinced it is at this point.

3 is hopefully a much more efficient engine than 2.

I'm not looking at 4 for now but keeping an eye on the possibility, at
least via C code generation.

relative to the current interpreter -- I got 80 sec with the
interpreter and 1 sec with the new byte code engine.

I am for now trying to get away without declarations and pre-testing
for the best cases before passing others off to the current internal
code.  By taking advantage of the mechanisms we use now to avoid
uneccessary copies it _looks_ like this allows me ot avoid boxing up
intermediate values in many cases and that seems to help a lot. Given
the overhead of the engine I'm not sure if specific type information
would help that much (quick experiments suggest it doesn't but that
needs more testing) -- it would of course pay off with machine code
generation.

I suspect the main reason for the difference in the vectorized case is
that our current code does not special-case the vector/scalar case. R
has more general recycling rules than Matlab, and the current code in
the interpreter is written for the general case only (I thought we had
special-cased scalar/scalar but unless I missed something in a quick
look it appears not).

We've had a few of those, and I suspect there are plenty more. There
is always a trade-off in complicating the code and the consequences
for maintainability that implies. A 1.5 factor difference here I find
difficult to get excited about, but it might be worth a look.

luke

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

R badly lags matlab on performance?

Thread (23 messages)