Here's a small R program:
---------------------------------------------------------------------------
a <- rep(1,10000000)
system.time(a <- a + 1)
system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
---------------------------------------------------------------------------
and here's its matlab version:
---------------------------------------------------------------------------
function forloop()
a = ones(1e7,1);
tic; a = a + 1; toc
tic
for i=1:1e7
a(i) = a(i) + 1;
end
toc
---------------------------------------------------------------------------
The machine used for testing was an i386 core 2 duo machine at 2.2
GHz, running OS X `Tiger'. The R was 2.8.0 and the matlab was 2008a.
Here's the performance (time taken in seconds):
Matlab R Times
Vector version 0.0446 0.067 1.5x
For loop version 0.0992 42.209 425.5x
So the R is 1.5x costlier for the vector version and 425.5x costlier
with matlab.
I wonder what we're doing wrong!
On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah <ajayshah at mayin.org> wrote:
AS> system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
AS> I wonder what we're doing wrong!
it is no secret that R does badly with loops. Thats why it is
recommended to use vectorized operations.
Another approach is just in time compilation, which would speed up
simple loops.
There is an interesting entry on R-wiki showing speeding up things:
http://wiki.r-project.org/rwiki/doku.php?id=tips:programming:code_optim2&s=just%20time
hth
Stefan
Here's a small R program:
---------------------------------------------------------------------------
a <- rep(1,10000000)
system.time(a <- a + 1)
system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
---------------------------------------------------------------------------
and here's its matlab version:
---------------------------------------------------------------------------
function forloop()
a = ones(1e7,1);
tic; a = a + 1; toc
tic
for i=1:1e7
a(i) = a(i) + 1;
end
toc
---------------------------------------------------------------------------
The machine used for testing was an i386 core 2 duo machine at 2.2
GHz, running OS X `Tiger'. The R was 2.8.0 and the matlab was 2008a.
Here's the performance (time taken in seconds):
Matlab R Times
Vector version 0.0446 0.067 1.5x
For loop version 0.0992 42.209 425.5x
So the R is 1.5x costlier for the vector version and 425.5x costlier
with matlab.
I wonder what we're doing wrong!
OK, I'll bite.
1. are you sure that the difference in the vector version is
real? On my computer the R system time for the
vector computation ranges from 0.052 to 0.1.
I would definitely check how much variation there is.
2. Every language has different advantages and disadvantages.
I believe Matlab does a lot of byte-code compiling. If you're
interested in this sort of thing, try Ra
<http://www.milbo.users.sonic.net/ra/index.html> , a version
of R that allows similar speed-ups.
3. I appreciate that every language can be improved, but this
feels like something that is not "fixable" without changing
the language significantly. You have lots of options if you need
to do these kinds of operations -- Matlab (if you can afford it),
Octave (does it run as quickly as Matlab?), programming in C
(see http://modelingwithdata.org/ ), coding the critical
non-vectorizable bits of your algorithm in C or FORTRAN, etc. etc.
etc..
cheers
Ben Bolker
On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote:
On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah <ajayshah at mayin.org> wrote:
AS> system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
AS> I wonder what we're doing wrong!
it is no secret that R does badly with loops. Thats why it is
recommended to use vectorized operations.
But there's a big difference even on the vectorised version: a <- a +
1. Why should that be? Both systems should merely be handing down to
the BLAS. The (stock) R install has a less carefully setup BLAS as
compared with the (stock) matlab install?
On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote:
On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah <ajayshah <at> mayin.org> wrote:
AS> system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
AS> I wonder what we're doing wrong!
it is no secret that R does badly with loops. Thats why it is
recommended to use vectorized operations.
But there's a big difference even on the vectorised version: a <- a +
1. Why should that be? Both systems should merely be handing down to
the BLAS. The (stock) R install has a less carefully setup BLAS as
compared with the (stock) matlab install?
See my other message. I'm suspicious of the real size of
the difference, I think the difference could well be
noise. Also, this particular bit of arithmetic doesn't
involve BLAS -- see arithmetic.c (dig down until you get to
integer_binary) ...
Ben Bolker
| On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote:
| > On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah <ajayshah at mayin.org> wrote:
| >
| > AS> system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
| >
| > AS> I wonder what we're doing wrong!
| >
| > it is no secret that R does badly with loops. Thats why it is
| > recommended to use vectorized operations.
|
| But there's a big difference even on the vectorised version: a <- a +
| 1. Why should that be? Both systems should merely be handing down to
| the BLAS. The (stock) R install has a less carefully setup BLAS as
| compared with the (stock) matlab install?
Your example does not involve BLAS.
As for jit and Ra, that was immediate reaction too but I found that jit does
not help on your example. But I concur fully with what Ben said --- use the
tool that is appropriate for the task at hand. If your task is running for
loops, Matlab does it faster and you have Matlab, well then you should by all
means use Matlab.
Dirk
Three out of two people have difficulties with fractions.
As for jit and Ra, that was immediate reaction too but I found that jit does
not help on your example. But I concur fully with what Ben said --- use the
tool that is appropriate for the task at hand. If your task is running for
loops, Matlab does it faster and you have Matlab, well then you should by all
means use Matlab.
A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.
On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote:
On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah <ajayshah <at>
mayin.org> wrote:
AS> system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
AS> I wonder what we're doing wrong!
it is no secret that R does badly with loops. Thats why it is
recommended to use vectorized operations.
But there's a big difference even on the vectorised version: a <- a +
1. Why should that be? Both systems should merely be handing down to
the BLAS. The (stock) R install has a less carefully setup BLAS as
compared with the (stock) matlab install?
See my other message. I'm suspicious of the real size of
the difference, I think the difference could well be
noise. Also, this particular bit of arithmetic doesn't
involve BLAS -- see arithmetic.c (dig down until you get to
integer_binary) ...
Ben Bolker
I just ran Ajay's examples 3 times over:
R 2.8.1 on Debian Etch using 1MB of RAM in a VirtualBox VM
running on a 1.73GHz CPU. Results:
user system elapsed
Vector: 0.112 0.288 0.393
Loop: 65.276 0.300 65.572
Vector: 0.076 0.312 0.389
Loop: 65.744 0.332 66.076
Vector: 0.068 0.328 0.394
Loop: 65.292 0.308 65.597
Not dissimilar to Ajay's R times (though my loop was about 50% longer).
However, all three runs were very similar -- a little noise,
but not much!
I don't have octave (on the same machine) to compare these with.
And I don't have MatLab at all. So I can't provide a comparison
on that front, I'm afraid.
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jan-09 Time: 19:10:51
------------------------------ XFMail ------------------------------
On Sat, Jan 3, 2009 at 12:37 PM, Ajay Shah <ajayshah at mayin.org> wrote:
As for jit and Ra, that was immediate reaction too but I found that jit does
not help on your example. But I concur fully with what Ben said --- use the
tool that is appropriate for the task at hand. If your task is running for
loops, Matlab does it faster and you have Matlab, well then you should by all
means use Matlab.
A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.
I would disagree: only a small amount of statistical computation
involves explicit loops at the R level. More importantly, the time
taken for this looping is a very small fraction of the total time
spent doing statistical computation.
Hadley
I don't have octave (on the same machine) to compare these with.
And I don't have MatLab at all. So I can't provide a comparison
on that front, I'm afraid.
Ted.
Just to add some timings, I was running 1000 repetitions (adding up to
a=1001) on a notebook with core 2 duo T7200
R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238
R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943
Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296
Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005
But I am not sure how representative this is with that very simple
example. To compare Matlab speed with R a kind of benchmark suite is
necessary. Like: http://www.sciviews.org/benchmark/index.html but that
one is very old. I would guess that there did not change much: sometimes
R is faster, sometimes not.
This difference between the Windows and Linux timing is probably not
really relevant: when I was comparing the timings of my usual analysis
there was no difference between the two operating systems. (count data
and time series stuff)
Cheers
Stefan
On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote:
On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah <ajayshah <at>
mayin.org> wrote:
AS> system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
AS> I wonder what we're doing wrong!
it is no secret that R does badly with loops. Thats why it is
recommended to use vectorized operations.
But there's a big difference even on the vectorised version: a <- a +
1. Why should that be? Both systems should merely be handing down to
the BLAS. The (stock) R install has a less carefully setup BLAS as
compared with the (stock) matlab install?
See my other message. I'm suspicious of the real size of
the difference, I think the difference could well be
noise. Also, this particular bit of arithmetic doesn't
involve BLAS -- see arithmetic.c (dig down until you get to
integer_binary) ...
Ben Bolker
I just ran Ajay's examples 3 times over:
R 2.8.1 on Debian Etch using 1MB of RAM in a VirtualBox VM
running on a 1.73GHz CPU. Results:
user system elapsed
Vector: 0.112 0.288 0.393
Loop: 65.276 0.300 65.572
Vector: 0.076 0.312 0.389
Loop: 65.744 0.332 66.076
Vector: 0.068 0.328 0.394
Loop: 65.292 0.308 65.597
Not dissimilar to Ajay's R times (though my loop was about 50% longer).
However, all three runs were very similar -- a little noise,
but not much!
I don't have octave (on the same machine) to compare these with.
And I don't have MatLab at all. So I can't provide a comparison
on that front, I'm afraid.
Ted.
on my machine, matlab and r perform as in to ajay's results, but octave
runs the for loop matlab code painfully slowly, worse than r.
vQ
As for jit and Ra, that was immediate reaction too but I found that jit does
not help on your example. But I concur fully with what Ben said --- use the
tool that is appropriate for the task at hand. If your task is running for
loops, Matlab does it faster and you have Matlab, well then you should by all
means use Matlab.
A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.
this list has a tradition of encouraging users to use apply&relatives
instead of for loops (and to use vectorised code instead of both).
compare the following for n = 10^6 and n = 10^7:
x = 1:n
system.time({for (i in 1:n) x[i] = x[i] + 1})
x = 1:n
system.time({x = sapply(x, `+`, 1)})
vQ
As for jit and Ra, that was immediate reaction too but I found that jit does
not help on your example. But I concur fully with what Ben said --- use the
tool that is appropriate for the task at hand. If your task is running for
loops, Matlab does it faster and you have Matlab, well then you should by all
means use Matlab.
A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.
I don't know Matlab, but I think the thing that is slowing R down here
is its generality. When you write
a[i] <- a[i] + 1
in R, it could potentially change the meaning of a, [, <-, and + on each
step through the loop, so R looks them up again each time. I would
guess that's not possible in Matlab, or perhaps Matlab has an optimizer
that can recognize that in the context where the loop is being
evaluated, those changes are known not to happen. It *would* be
possible to write such an optimizer for R, and Luke Tierney's byte code
compiler-in-progress might incorporate such a thing.
For the difference in timing on the vectorized versions, I'd guess that
Matlab uses a better compiler than gcc. It's also likely that R
incorporates some unnecessary testing even in a case like this, because
it's easier to maintain code that is obviously sane than it is to
maintain code that may not be. R has a budget which is likely several
orders of magnitude smaller than Mathworks has, so it makes sense to
target our resources at more important issues than making fast things
run a bit faster.
Duncan Murdoch
As for jit and Ra, that was immediate reaction too but I found that jit
does
not help on your example. But I concur fully with what Ben said --- use
the
tool that is appropriate for the task at hand. If your task is running
for
loops, Matlab does it faster and you have Matlab, well then you should by
all
means use Matlab.
A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.
I don't know Matlab, but I think the thing that is slowing R down here is its
generality. When you write
a[i] <- a[i] + 1
in R, it could potentially change the meaning of a, [, <-, and + on each step
through the loop, so R looks them up again each time. I would guess that's
not possible in Matlab, or perhaps Matlab has an optimizer that can recognize
that in the context where the loop is being evaluated, those changes are
known not to happen.
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.
It *would* be possible to write such an optimizer for
R, and Luke Tierney's byte code compiler-in-progress might incorporate such a
thing.
The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4. The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.
For the difference in timing on the vectorized versions, I'd guess that
Matlab uses a better compiler than gcc. It's also likely that R incorporates
some unnecessary testing even in a case like this, because it's easier to
maintain code that is obviously sane than it is to maintain code that may not
be. R has a budget which is likely several orders of magnitude smaller than
Mathworks has, so it makes sense to target our resources at more important
issues than making fast things run a bit faster.
Another possibility is optimization setting tht may be higher and/or
more processor specific than those used by R.
We do handle the case where both arguments to + are scalar (i.e. of
length 1) separately but I don't recall if we do so for the
vector/scalar case also -- I suspect not as that would make the code
less maintainable for not a very substantial gain.
luke
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 3 January 2009 at 18:02, luke at stat.uiowa.edu wrote:
| The current byte code compiler available from my web site speeds this
| (highly artificial) example by about a factor of 4. The experimental
| byte code engine I am currently working on (and that can't yet do much
| more than an example like this) speeds this up by a factor of
| 80. Whether that level of improvement (for toy examples like this)
| will remain once the engine is more complete and whether a reasonable
| compiler can optimize down to the assembly code I used remain to be
| seen.
Stephen Milborrow's jit compiler and Ra variant -- for which the R 2.8.1
version was released today -- already delivers improvements somewhere near
the middle of 'four' to 'eighty' times:
edd at ron:~> R --slave < /tmp/ajayshah.R
user system elapsed
0.052 0.056 0.109
user system elapsed
68.984 0.088 69.384
edd at ron:~> Ra --slave < /tmp/ajayshah.Ra
user system elapsed
0.096 0.068 0.162
user system elapsed
2.06 0.10 2.16
edd at ron:~> diff -u /tmp/ajayshah.R*
--- /tmp/ajayshah.R 2009-01-03 11:05:03.000000000 -0600
+++ /tmp/ajayshah.Ra 2009-01-03 18:37:20.000000000 -0600
@@ -2,4 +2,4 @@
system.time(a <- a + 1)
-system.time(for (i in 1:10000000) {a[i] <- a[i] + 1})
+system.time({ jit(1); for (i in 1:10000000) {a[i] <- a[i] + 1} })
edd at ron:~>
The vectorised example shows highly variable times anywhere between 0.11 to
0.17 seconds so taking ratios is speculative. Either way, you can get a
speedup of maybe 30 times by switching to a just-in-time compiler for R and
adding an additional function call.
Dirk
Three out of two people have difficulties with fractions.
I wrote once the benchmark mentioned in Stefan's post (based on initial
work by Stephan Steinhaus), and it is still available for those who
would like to update it. Note that it is lacking some checking of the
results to make sure that calculation is not only faster, but correct!
Now, I'll tell why I haven't update it, and you'll see it is connected
with the current topic.
First, lack of time, for sure.
Second, this benchmark has always been very criticized by several people
including from the R Core Team. Basically, this is just toy examples,
disconnected from the reality. Even with better cases, benchmarks do not
take into account the time needed to write your code for your particular
application (from the question to the results).
I wrote this benchmark at a time when I overemphasized on the pure
performances of the software, at a time I was looking for the best
software I would choose as a tool for my future career.
Now, what's my choice, ten years later? Not two, not three software...
but just ONE: R. I tend to do 95% of my calculations with R (the rest is
ImageJ/Java). Indeed, this benchmark results (and the toy example of
Ajay Shah, a <- a + 1) should be only considered very marginally,
because what is important is how your software tool is performing in
real application, not in simplistic toy examples.
R lays behind Matlab for pure arithmetic calculation... right! But R has
a better object oriented approach, features more variable types (factor,
for instance), and has a richer mechanism for metadata handling (col/row
names, various other attributes, ...) that makes it richer to
instanciate complex datasets or analyzes than Matlab. Of course, this
has a small cost in performance.
As soon as you think your problem in a vectorized way, R is one of the
best tool, I think, to go "from the question to the answer" in real
situations. How could we quantify this? I would only see big contests
where experts of each language would be presented real problems and one
would measure the time needed to solve the problem,... Also, one should
measure: the robustness, reusability, flexibility, "elegance" of the
code produced (how to quantify these?). Such kind of contest between R,
Matlab, Octave, Scilab, etc. is very unlikely to happen.
At the end, it is really a matter of personal feeling: you can make your
own little contest by yourself: trying to solve a given problem in
several software... and then decide which one you prefer. I think many
people do/did this, and the still exponential growth of R use (at least,
as it can be observed by the increasing number of CRAN R packages) is
probably a good sign that R is probably one of the top performers when
it comes to efficiency "from the question to the answer" in real
problems, not just on toy little examples!
(sorry for been so long, I think I miss some interaction with the R
community this time ;-)
Best,
Philippe
..............................................<?}))><........
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems
) ) ) ) ) Mons-Hainaut University, Belgium
( ( ( ( (
..............................................................
Stefan Grosse wrote:
I don't have octave (on the same machine) to compare these with.
And I don't have MatLab at all. So I can't provide a comparison
on that front, I'm afraid.
Ted.
Just to add some timings, I was running 1000 repetitions (adding up to
a=1001) on a notebook with core 2 duo T7200
R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238
R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943
Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296
Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005
But I am not sure how representative this is with that very simple
example. To compare Matlab speed with R a kind of benchmark suite is
necessary. Like: http://www.sciviews.org/benchmark/index.html but that
one is very old. I would guess that there did not change much: sometimes
R is faster, sometimes not.
This difference between the Windows and Linux timing is probably not
really relevant: when I was comparing the timings of my usual analysis
there was no difference between the two operating systems. (count data
and time series stuff)
Cheers
Stefan
Folks:
Merely my opinions, of course ...
Just to amplify a little on Philippe's remarks by paraphrasing comments made
many times on this list before. In a galaxy far away a long time ago ...
John Chambers and his Bell Labs colleagues -- and subsequently R&R (Ross
Ihaka and Robert Gentleman) and R's Core Team Developers -- made the
decision to develop a language/software for data analysis, data graphics and
statistics. Recognizing that "most" tasks within this arena were for
"one-off" custom problems rather than repetitive "production" applications,
they emphasized flexibility, ease of use and relatively straightforward
extensibility. While I'm sure that they did not ignore performance, it was
not the primary consideration (Chambers, et al's Blue Book speaks to these
issues much more eloquently; I think it should be required reading _BEFORE_
one launches into criticism). As has been frequently mentioned, they knew
that there are two "outs" for such matters: Moore's Law and the ability to
easily incorporate customized C code into R. I submit that the data bear out
the overwhelming wisdom of their choice.
This is not to that R is perfect: there are certainly times when performance
is inadequate, and design or implementation could have been (or be)
improved. But no one bats a thousand (baseball idiom): as Philippe said, for
many (maybe most?) of us R is both awesome and indispensable!
For me the real challenge is: what's next? R/S is so blazingly successful
that it seems to extingush the need for continuing improvement(the demise of
Luke Tierney's X-Lisp Stat is an example): what's the next step in the
sequence IMSL --> SAS ---> S/R --> ?? . But hopefully this is merely my
ignorance speaking, and smart folks are already working on it.
Regards to all,
Bert Gunter
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Philippe Grosjean
Sent: Sunday, January 04, 2009 2:02 AM
To: Stefan Grosse
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] R badly lags matlab on performance?
I wrote once the benchmark mentioned in Stefan's post (based on initial
work by Stephan Steinhaus), and it is still available for those who
would like to update it. Note that it is lacking some checking of the
results to make sure that calculation is not only faster, but correct!
Now, I'll tell why I haven't update it, and you'll see it is connected
with the current topic.
First, lack of time, for sure.
Second, this benchmark has always been very criticized by several people
including from the R Core Team. Basically, this is just toy examples,
disconnected from the reality. Even with better cases, benchmarks do not
take into account the time needed to write your code for your particular
application (from the question to the results).
I wrote this benchmark at a time when I overemphasized on the pure
performances of the software, at a time I was looking for the best
software I would choose as a tool for my future career.
Now, what's my choice, ten years later? Not two, not three software...
but just ONE: R. I tend to do 95% of my calculations with R (the rest is
ImageJ/Java). Indeed, this benchmark results (and the toy example of
Ajay Shah, a <- a + 1) should be only considered very marginally,
because what is important is how your software tool is performing in
real application, not in simplistic toy examples.
R lays behind Matlab for pure arithmetic calculation... right! But R has
a better object oriented approach, features more variable types (factor,
for instance), and has a richer mechanism for metadata handling (col/row
names, various other attributes, ...) that makes it richer to
instanciate complex datasets or analyzes than Matlab. Of course, this
has a small cost in performance.
As soon as you think your problem in a vectorized way, R is one of the
best tool, I think, to go "from the question to the answer" in real
situations. How could we quantify this? I would only see big contests
where experts of each language would be presented real problems and one
would measure the time needed to solve the problem,... Also, one should
measure: the robustness, reusability, flexibility, "elegance" of the
code produced (how to quantify these?). Such kind of contest between R,
Matlab, Octave, Scilab, etc. is very unlikely to happen.
At the end, it is really a matter of personal feeling: you can make your
own little contest by yourself: trying to solve a given problem in
several software... and then decide which one you prefer. I think many
people do/did this, and the still exponential growth of R use (at least,
as it can be observed by the increasing number of CRAN R packages) is
probably a good sign that R is probably one of the top performers when
it comes to efficiency "from the question to the answer" in real
problems, not just on toy little examples!
(sorry for been so long, I think I miss some interaction with the R
community this time ;-)
Best,
Philippe
..............................................<?}))><........
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems
) ) ) ) ) Mons-Hainaut University, Belgium
( ( ( ( (
..............................................................
Stefan Grosse wrote:
I don't have octave (on the same machine) to compare these with.
And I don't have MatLab at all. So I can't provide a comparison
on that front, I'm afraid.
Ted.
Just to add some timings, I was running 1000 repetitions (adding up to
a=1001) on a notebook with core 2 duo T7200
R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238
R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943
Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296
Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005
But I am not sure how representative this is with that very simple
example. To compare Matlab speed with R a kind of benchmark suite is
necessary. Like: http://www.sciviews.org/benchmark/index.html but that
one is very old. I would guess that there did not change much: sometimes
R is faster, sometimes not.
This difference between the Windows and Linux timing is probably not
really relevant: when I was comparing the timings of my usual analysis
there was no difference between the two operating systems. (count data
and time series stuff)
Cheers
Stefan
On Sat, Jan 3, 2009 at 7:02 PM, <luke at stat.uiowa.edu> wrote:
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.
Wow, I had no idea the interpreter was so awful. Just some simple
tree-to-tree transformations would speed things up, I'd think, e.g.
`<-`(`[`(...), ...) ==> `<-[`(...,...).
The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4. The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.
Not sure I follow here. It sounds as though you have 4 levels of execution:
1) interpreter
2) current byte-code engine
3) future byte-code engine
4) compilation of byte codes into machine code
Is that right? I'm not sure what the difference between 2 and 3 is,
and what the 80x figure refers to.
I'd think that one of the challenges will be the dynamic types --
where you don't know statically if an argument is a logical, an
integer, a real, or a string. Will you be adding declarations,
assuming the best case and interpreting all others or ...?
Does Matlab have the same type problem? Or does it make everything
into a double? That still wouldn't explain the vectorized case, since
the type dispatch only has to happen once.
Sometimes some very simple changes in the implementation can make huge
differences in overall runtime. I still remember a 10-word change I
made in Maclisp in 1975 or so where I special-cased the two-argument
case of (+ integer integer) => integer -- what it normally did was
convert it to the general n-argument arbitrary-type case. This
speeded up (+ integer integer) by 10x (which doesn't matter much), but
also sped up the overall Macsyma symbolic algebra system by something
like 20%.
-s
-s
On Sat, Jan 3, 2009 at 7:02 PM, <luke at stat.uiowa.edu> wrote:
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.
Wow, I had no idea the interpreter was so awful. Just some simple
tree-to-tree transformations would speed things up, I'd think, e.g.
`<-`(`[`(...), ...) ==> `<-[`(...,...).
'Awful' seems a bit strong. It's also a bit more complicated in that
one needs both [ and [<- in complex assignment expressions, but the
point that one could rewrite assignments into something that can be
more efficiently executed is certainly true. There are also a number
of opportunities to do things like this. They do have repercussions
though -- in this case one would either need to modify code that needs
to lok at the original code to undo the operatin, or add a new data
structure that contains the original code object and the rewritten
one, and deal with implications for serialization, and so on. Doable
of course, and worth doing if the payoff is high enough, but I'm not
convinced it is at this point.
The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4. The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.
Not sure I follow here. It sounds as though you have 4 levels of execution:
1) interpreter
2) current byte-code engine
3) future byte-code engine
4) compilation of byte codes into machine code
Is that right? I'm not sure what the difference between 2 and 3 is,
3 is hopefully a much more efficient engine than 2.
I'm not looking at 4 for now but keeping an eye on the possibility, at
least via C code generation.
and what the 80x figure refers to.
relative to the current interpreter -- I got 80 sec with the
interpreter and 1 sec with the new byte code engine.
I'd think that one of the challenges will be the dynamic types --
where you don't know statically if an argument is a logical, an
integer, a real, or a string. Will you be adding declarations,
assuming the best case and interpreting all others or ...?
I am for now trying to get away without declarations and pre-testing
for the best cases before passing others off to the current internal
code. By taking advantage of the mechanisms we use now to avoid
uneccessary copies it _looks_ like this allows me ot avoid boxing up
intermediate values in many cases and that seems to help a lot. Given
the overhead of the engine I'm not sure if specific type information
would help that much (quick experiments suggest it doesn't but that
needs more testing) -- it would of course pay off with machine code
generation.
Does Matlab have the same type problem? Or does it make everything
into a double? That still wouldn't explain the vectorized case, since
the type dispatch only has to happen once.
I suspect the main reason for the difference in the vectorized case is
that our current code does not special-case the vector/scalar case. R
has more general recycling rules than Matlab, and the current code in
the interpreter is written for the general case only (I thought we had
special-cased scalar/scalar but unless I missed something in a quick
look it appears not).
Sometimes some very simple changes in the implementation can make huge
differences in overall runtime. I still remember a 10-word change I
made in Maclisp in 1975 or so where I special-cased the two-argument
case of (+ integer integer) => integer -- what it normally did was
convert it to the general n-argument arbitrary-type case. This
speeded up (+ integer integer) by 10x (which doesn't matter much), but
also sped up the overall Macsyma symbolic algebra system by something
like 20%.
We've had a few of those, and I suspect there are plenty more. There
is always a trade-off in complicating the code and the consequences
for maintainability that implies. A 1.5 factor difference here I find
difficult to get excited about, but it might be worth a look.
luke
-s
-s
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On Sat, Jan 3, 2009 at 7:02 PM, <luke at stat.uiowa.edu> wrote:
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.
Wow, I had no idea the interpreter was so awful. Just some simple
tree-to-tree transformations would speed things up, I'd think, e.g.
`<-`(`[`(...), ...) ==> `<-[`(...,...).
Doesn't really help (and it's not quite correct: a[2] <- 1 is equivalent to
a <- `[<-`(a, 2, 1)
with some sneakiness that assumes that the two a's are the same, so that
you might destructively modify the second instance.)
The actual interpreter is not much of a bottleneck. There are two other
major obstacles:
1) Things may not be what they seem
2) Insufficient control over object duplication
1) is the major impediment to compilability (look for talks/papers by
Luke for further details and ideas about what to do about it). The basic
issue is that at no point can you be sure that the "log" function
calculates logarithms. It might be redefined as a side effect of the
previous expression. This is a feature of the language as such, and it
is difficult to change without destroying features that people actually
use. The upshot is that every time we see an object name, we enter a
search along the current search path to find it's current binding.
2) is a little contentious: It is not certain how much we gain by
attacking it, only that it would be a heck of a lot of work. The issue
is that we do not use reference counting like e.g. Java or Tcl does. We
use a primitive counter called NAMED which can be 0,1, or 2, and only
counts upwards. When it reaches 2, destructive modification is
disallowed and the object must be copied. I.e. consider
x <- rnorm(1e6)
y <- x
at this point we actually have x and y referring to the same ~8MB block
of memory. However, the semantics of R is that this is a virtual copy,
so y[1] <- 1 or x[1] <- 1 entails that we duplicate the object. Fair
enough, if an object is bound to multiple names, we can not modify it in
place; the problem is that we lose track when the references go away,
and thus,
y <- x
y[1] <- 1
x[1] <- 1
causes TWO duplications. The really nasty bit is that we very often get
objects temporarily bound to two names (think about what happens with
arguments in function calls).
Unfortunately, we cannot base the memory management purely on reference
counting. And of course, doing so, even partially, implies that we need
to have a much more concrete approach to the unbinding of objects.
Notice, for instance that the names used in a function evaluation frame
are not guaranteed to be unbind-able when the function exits. Something
might have saved the evaluation environment, e.g. using e <<-
environment() but there are also more subtle methods.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Sun, Jan 4, 2009 at 4:50 PM, <luke at stat.uiowa.edu> wrote:
On Sun, 4 Jan 2009, Stavros Macrakis wrote:
On Sat, Jan 3, 2009 at 7:02 PM, <luke at stat.uiowa.edu> wrote:
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables,
I'd think another problem is call-by-need. I suppose inlining or
batch analyzing groups of functions helps there.
including ones like [<- that are assembled and looked up as strings on every call.
Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `<-`(`[`(...), ...) ==> `<-[`(...,...).
'Awful' seems a bit strong.
Well, I haven't looked at the code, but if I'm interpreting "assembled
and looked up as strings on every call" correctly, this means taking
names, expanding them to strings, concatenating them, re-interning
them, then looking up the value. That sounds pretty awful to me both
in the sense of being inefficient and of being ugly.
I'd think that one of the challenges will be the dynamic types --...
I am for now trying to get away without declarations and pre-testing
for the best cases before passing others off to the current internal
code.
Have you considered using Java bytecodes and taking advantage of
dynamic compilers like Hotspot? They often do a good job in cases
like this by assuming that types are fairly predictable from one run
to the next of a piece of code. Or is the Java semantic model too
different?
...There is always a trade-off in complicating the code and the consequences for maintainability that implies.
Agreed entirely!
A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look.
Thanks for the explanations of the internals.
I understand about the 'redefining log' problem in the interpreter,
but I wasn't aware of the NAMED counter. In both cases, beyond static
analysis, dynamic Java compilers do a pretty good job, but I don't
know if Java bytecodes are suitable for R, and if they are not, it
would be a lot of work to duplicate the analysis for an R compiler.
-s
On Sun, Jan 4, 2009 at 4:50 PM, <luke at stat.uiowa.edu> wrote:
On Sun, 4 Jan 2009, Stavros Macrakis wrote:
On Sat, Jan 3, 2009 at 7:02 PM, <luke at stat.uiowa.edu> wrote:
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables,
I'd think another problem is call-by-need. I suppose inlining or
batch analyzing groups of functions helps there.
Yes. The overhead can probably be reduced at least in compiled code,
but it will always be significant. Many primitives are strict and do
not depend on call stack position so inlinign is safe and that is done
in the current compiler. Figuring out whether inlining is safe for
user functions is more problematic and may need declarations.
including ones like [<- that are assembled and looked up as strings on every call.
Wow, I had no idea the interpreter was so awful. Just some simple tree-to-tree transformations would speed things up, I'd think, e.g. `<-`(`[`(...), ...) ==> `<-[`(...,...).
'Awful' seems a bit strong.
Well, I haven't looked at the code, but if I'm interpreting "assembled
and looked up as strings on every call" correctly, this means taking
names, expanding them to strings, concatenating them, re-interning
them, then looking up the value.
That's about it as I recall.
That sounds pretty awful to me both
in the sense of being inefficient and of being ugly.
Ugly: a matter of taste and opinion. Inifficient: yes, but in the
context of the way the rest of the computation is done it is simple
and efficient enough (no point in optimizing this given the other
issues at this point). It doesn't make the interpreter awful, which
is what you said.
I'd think that one of the challenges will be the dynamic types --...
I am for now trying to get away without declarations and pre-testing
for the best cases before passing others off to the current internal
code.
Have you considered using Java bytecodes and taking advantage of
dynamic compilers like Hotspot? They often do a good job in cases
like this by assuming that types are fairly predictable from one run
to the next of a piece of code. Or is the Java semantic model too
different?
My sense at this point is that this isn't a particularly good match,
in particular as one of my objectives is to try to take advantage of
opportunities for computing some compound numerical operations on
vectors in parallel. But the possibility of translating the R byte
code to C, JVM, .Net, etc is something I'm trying to keep in mind.
luke
...There is always a trade-off in complicating the code and the consequences for maintainability that implies.
Agreed entirely!
A 1.5 factor difference here I find difficult to get excited about, but it might be worth a look.
I agree. The 1.5 isn't a big deal at all.
-s
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu