R badly lags matlab on performance?
On Sun, 4 Jan 2009, Stavros Macrakis wrote:
On Sat, Jan 3, 2009 at 7:02 PM, <luke at stat.uiowa.edu> wrote:
R's interpreter is fairly slow due in large part to the allocation of argument lists and the cost of lookups of variables, including ones like [<- that are assembled and looked up as strings on every call.
Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `<-`(`[`(...), ...) ==> `<-[`(...,...).
'Awful' seems a bit strong. It's also a bit more complicated in that one needs both [ and [<- in complex assignment expressions, but the point that one could rewrite assignments into something that can be more efficiently executed is certainly true. There are also a number of opportunities to do things like this. They do have repercussions though -- in this case one would either need to modify code that needs to lok at the original code to undo the operatin, or add a new data structure that contains the original code object and the rewritten one, and deal with implications for serialization, and so on. Doable of course, and worth doing if the payoff is high enough, but I'm not convinced it is at this point.
The current byte code compiler available from my web site speeds this (highly artificial) example by about a factor of 4. The experimental byte code engine I am currently working on (and that can't yet do much more than an example like this) speeds this up by a factor of 80. Whether that level of improvement (for toy examples like this) will remain once the engine is more complete and whether a reasonable compiler can optimize down to the assembly code I used remain to be seen.
Not sure I follow here. It sounds as though you have 4 levels of execution: 1) interpreter 2) current byte-code engine 3) future byte-code engine 4) compilation of byte codes into machine code Is that right? I'm not sure what the difference between 2 and 3 is,
3 is hopefully a much more efficient engine than 2. I'm not looking at 4 for now but keeping an eye on the possibility, at least via C code generation.
and what the 80x figure refers to.
relative to the current interpreter -- I got 80 sec with the interpreter and 1 sec with the new byte code engine.
I'd think that one of the challenges will be the dynamic types -- where you don't know statically if an argument is a logical, an integer, a real, or a string. Will you be adding declarations, assuming the best case and interpreting all others or ...?
I am for now trying to get away without declarations and pre-testing for the best cases before passing others off to the current internal code. By taking advantage of the mechanisms we use now to avoid uneccessary copies it _looks_ like this allows me ot avoid boxing up intermediate values in many cases and that seems to help a lot. Given the overhead of the engine I'm not sure if specific type information would help that much (quick experiments suggest it doesn't but that needs more testing) -- it would of course pay off with machine code generation.
Does Matlab have the same type problem? Or does it make everything into a double? That still wouldn't explain the vectorized case, since the type dispatch only has to happen once.
I suspect the main reason for the difference in the vectorized case is that our current code does not special-case the vector/scalar case. R has more general recycling rules than Matlab, and the current code in the interpreter is written for the general case only (I thought we had special-cased scalar/scalar but unless I missed something in a quick look it appears not).
Sometimes some very simple changes in the implementation can make huge differences in overall runtime. I still remember a 10-word change I made in Maclisp in 1975 or so where I special-cased the two-argument case of (+ integer integer) => integer -- what it normally did was convert it to the general n-argument arbitrary-type case. This speeded up (+ integer integer) by 10x (which doesn't matter much), but also sped up the overall Macsyma symbolic algebra system by something like 20%.
We've had a few of those, and I suspect there are plenty more. There is always a trade-off in complicating the code and the consequences for maintainability that implies. A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look. luke
-s
-s
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu