Skip to content

How to write efficient R code

9 messages · Lennart.Borgman@astrazeneca.com, Brian Ripley, Tom Blackwell +6 more

#
I have been lurking in this list a while and searching in the archives to
find out how one learns to write fast R code. One solution seems to be to
write part of the code not in R but in C. However after finding a benchmark
article (http://www.sciviews.org/other/benchmark.htm) I have been more
interested in making the R code itself more efficient. I would like to find
more info about this. I have tried to mail the contact person for the
benchmark, but I have so recieved no reply.

I am not an R programmer (or statistican) so I do not know R well. I am
looking for some advice about writing fast R code. What about the different
data types for example? Is there some good place to start to look for more
info about this? 


Thanks for any pointers
Lennart
#
`S Programming' (see the FAQ) has a whole chapter with case studies.
Beware that what is efficient under one version of S is not necessarily so 
under another, and that applies to R today vs R in 1999 (when those 
examples were done).  However, the general principles are good for all 
time.
On Tue, 17 Feb 2004 Lennart.Borgman at astrazeneca.com wrote:

            

  
    
#
Lennart  -

My two rules are:

  1. Be straightforward.  Don't try to be too fancy.  Don't worry
	about execution time until you have the WHOLE thing programmed
	and DOING everything you want it to.  Then profile it, if it's
	really going to be run more than 1000 times.  Execution time
	is NOT the issue.  Code maintainability IS.

  2. Use vector operations wherever possible.  Avoid explicit loops.
	However, the admonition to avoid loops is probably much less
	important now than it was with the Splus of 10 or 15 years ago.

(Not that I succeed in obeying these rules myself, all the time.)

Remember:  execution time is not the issue.  memory size may be.
clear, maintainable code definitely is.

In my opinion, the occasional questions you will see on this list about
incorporating C code, or trying to specify one data type over another,
come up only in very unusual, special cases.  Almost everything can be
done without loops in straight R, if you think about it first.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -
On Tue, 17 Feb 2004 Lennart.Borgman at astrazeneca.com wrote:

            
#
more
the
One way to make your codes more efficient is to use "vectorisation" --
vectorise your codes.  I'm not sure where you can find more
information about it, but an example would be to use the apply()
function on a data frame instead using a loop.  Avoid loops if you
can.

Kevin

--------------------------------------------
Ko-Kang Kevin Wang, MSc(Hon)
SLC Stats Workshops Co-ordinator
The University of Auckland
New Zealand
#
On Tue, 2004-02-17 at 12:21, Tom Blackwell wrote:
I've been using for maybe 6 months or less and am by no means an R
expert. But the above two points are extremely valid - my policy is to
always write code that I can read 2 months later without comments
(though in the end I do add them) - even if it requires loops.

However, after I'm sure the results are right I spend time on trying to
vectorise the code. And that has improved performace by orders of
magnitude (IMO, its also more elegant to have a one line vector
operation rather than a loop).

Of course as I progress towards the status of R expert I hope to be able
to write vectorised code on the fly :)

-------------------------------------------------------------------
Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
So the Zen master asked the hot-dog vendor, 
"Can you make me one with everything?"
- TauZero on Slashdot
#
You may also be interested in reading the latest article on artima.com 
(http://www.artima.com/intv/abstreffi.html) where Bjarne Stroustrup (the 
creator of C++) discusses some of the benefits and costs of abstraction, 
as well as premature vs. prudent optimisation.

It is important to remember that the key to improving execution speeds 
is profiling your running code - we're not good at anticipating what 
parts of a program will be slow.  It's much better to run the program 
and see.

Hadley
Lennart.Borgman at astrazeneca.com wrote:

            
#
Lennart.Borgman at astrazeneca.com wrote:

            
Lennart

To learn about "data types" take a look at the early chapters of An 
Introduction To R available at

http://cran.r-project.org/manuals.html

Richard
#
On Wed, 18 Feb 2004, Ko-Kang Kevin Wang wrote:
Umm. No.  Vectorization is definitely a good thing -- just about the only
coding change that improves both clarity and speed -- but replacing a loop
with apply() is not vectorisation in that sense.

Except for some cases of lapply, the apply functions are mostly clarity
optimisations rather than speed optimisations.


	-thomas
#
Rajarshi Guha <rxg218 at psu.edu> writes:
All true. A couple of additional remarks:

1) Some constructs are spectacularly inefficient, as you'll realize
   when you think about what they have to do. One standard example is

        for (i in 1:10000) 
            x[i] <- f(i)

   which becomes much faster if you preallocate x <- numeric(10000)
   (never mind that sapply will do it more neatly). Without
   preallocation, R will need to extend the array on every iteration,
   which require the whole array to be copied to a new location. It is
   a very good idea to keep your eyes open for these situations and
   try to avoid them.

2) On the other hand, don't be trapped by efficiency differences that
   might be "accidental" and go away in later releases. We've seen a
   couple of cases were the Wrong Way was actually faster than the
   Right Way (details elude me -- something with deparse/reparse vs.
   symbolic computations, I suspect), but you this easily leads to
   code that is hard to read, and may have subtle bugs.