Skip to content

[R-sig-dyn-mod] R vs. fortran for simulation

4 messages · Stephen Paul Ellner, E Michael Foster, Thomas Petzoldt

#
Based on my experience, the short answer is that R sits about where matlab does, faster than matlab for some things and slower for others. 

The long answer is "it depends", and mostly it depends on the extent to which your code can be vectorized. Any time you use a loop, R gets slow relative to a compiled language like fortran. If your simulation requires nested for-loops in R, regardless of how cleverly you code it, fortran would probably be much faster. 

When you can vectorize, R is competitive with fortran and I find it much easier to work with. About 5 years ago I was using R to solve a system of PDEs using method of lines. This involved solving a very large (dimension ~500) system of differential equations representing the values of the PDE solution at a grid of spatial locations, but the ODE system could be written in a dozen lines of loopless vectorized R code. It was still slow, so I spent a week rewriting the code in fortran (using an ODE solver from numerical recipes). My reward for this was a factor-of-two speedup, and I went back to R. 

R is also competitive (or more) when the thing you're trying to do is handled by a package that mostly works by calling compiled code. So again, there's no way to put one number on R versus fortran. 

Steve Ellner 
Ecology and Evolutionary Biology, Cornell 
----------------------------------------------------------------------
Message: 1
Date: Sat, 5 Nov 2011 15:18:13 -0500
From: E Michael Foster <emfoster at uab.edu>
To: "r-sig-dynamic-models at r-project.org"
        <r-sig-dynamic-models at r-project.org>
Subject: [R-sig-dyn-mod] speed of R versus fortran for simulation

Hi-
I have a question that is more general than this list but I hope it's one that the group can comment on with relative ease.
(I know what DES and ABM are but this question involves simulation more generally)

I've been simulating the behavior of a population of women over 50 years.  As one would expect, it takes days and days and days in Stata.  That wasn't too surprising.  I know in some vague way that these kinds of packages have a lot of computational "overhead".
I know my really nerdy economist friends use Fortran and tell me it's 100s of times faster than something like stata or Matlab.

Here's my question-could I use R?  I know it and use it for other things and am quite comfortable with it.

Could anyone provide a ballpark sense of where on the continuum of simulation speed R would fall vis a vis Stata and Fortran.  Let's say Stata is a zero and Fortran is 1.  Where would R fall?  I know it would depend on the application, but still, it would help me to know whether R would be closer to .90 or to .09 or even .009.  (I also realize the scale probably should be logarithmic, but I'm talking ballpark estimate.)

I was actually surprised to find this list and that folks were doing ABM etc with R.  so perhaps R is a better substitute for fortran than I fear--

Thanks for your help

E. Michael Foster
Professor, Department of Health Care Organization and Policy
School of Public Health
The University of Alabama at Birmingham
#
Thanks so much for taking the time to respond.  

This is kind of what I feared and I know about the benefits of vectorizing calculations.  
My difficulty is that the idea of loops is so drilled into my brain 

Know of a good resource for mastering vectorizing calculations?  
Googling, I certainly can see plenty of instances where folks are asking for help on technical problems--

E. Michael Foster
Professor, Department of Health Care Organization and Policy 
School of Public Health 
The University of Alabama at Birmingham 


-----Original Message-----
From: r-sig-dynamic-models-bounces at r-project.org [mailto:r-sig-dynamic-models-bounces at r-project.org] On Behalf Of Stephen Paul Ellner
Sent: Sunday, November 06, 2011 5:45 AM
To: r-sig-dynamic-models at r-project.org
Subject: [R-sig-dyn-mod] R vs. fortran for simulation

Based on my experience, the short answer is that R sits about where matlab does, faster than matlab for some things and slower for others. 

The long answer is "it depends", and mostly it depends on the extent to which your code can be vectorized. Any time you use a loop, R gets slow relative to a compiled language like fortran. If your simulation requires nested for-loops in R, regardless of how cleverly you code it, fortran would probably be much faster. 

When you can vectorize, R is competitive with fortran and I find it much easier to work with. About 5 years ago I was using R to solve a system of PDEs using method of lines. This involved solving a very large (dimension ~500) system of differential equations representing the values of the PDE solution at a grid of spatial locations, but the ODE system could be written in a dozen lines of loopless vectorized R code. It was still slow, so I spent a week rewriting the code in fortran (using an ODE solver from numerical recipes). My reward for this was a factor-of-two speedup, and I went back to R. 

R is also competitive (or more) when the thing you're trying to do is handled by a package that mostly works by calling compiled code. So again, there's no way to put one number on R versus fortran. 

Steve Ellner
Ecology and Evolutionary Biology, Cornell
----------------------------------------------------------------------
Message: 1
Date: Sat, 5 Nov 2011 15:18:13 -0500
From: E Michael Foster <emfoster at uab.edu>
To: "r-sig-dynamic-models at r-project.org"
        <r-sig-dynamic-models at r-project.org>
Subject: [R-sig-dyn-mod] speed of R versus fortran for simulation

Hi-
I have a question that is more general than this list but I hope it's one that the group can comment on with relative ease.
(I know what DES and ABM are but this question involves simulation more generally)

I've been simulating the behavior of a population of women over 50 years.  As one would expect, it takes days and days and days in Stata.  That wasn't too surprising.  I know in some vague way that these kinds of packages have a lot of computational "overhead".
I know my really nerdy economist friends use Fortran and tell me it's 100s of times faster than something like stata or Matlab.

Here's my question-could I use R?  I know it and use it for other things and am quite comfortable with it.

Could anyone provide a ballpark sense of where on the continuum of simulation speed R would fall vis a vis Stata and Fortran.  Let's say Stata is a zero and Fortran is 1.  Where would R fall?  I know it would depend on the application, but still, it would help me to know whether R would be closer to .90 or to .09 or even .009.  (I also realize the scale probably should be logarithmic, but I'm talking ballpark estimate.)

I was actually surprised to find this list and that folks were doing ABM etc with R.  so perhaps R is a better substitute for fortran than I fear--

Thanks for your help

E. Michael Foster
Professor, Department of Health Care Organization and Policy School of Public Health The University of Alabama at Birmingham _______________________________________________
R-sig-dynamic-models mailing list
R-sig-dynamic-models at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-dynamic-models
#
looking at your book (the chapters available on-line)--that looks quite helpful
can't say I normally read much computational biology, but I am doing some work applying DES and ABM in health services research 
so, this is doubly interesting (help me with my vectorizing issue and dynamic models more generally)
#
Dear Miachael,

first of all: I agree completely with Stephen Ellner and had exactly the 
same experience with both ODE/PDEs and ABMs:

1) Models in R run slower than C/C++ or Fortran, butprogramming in R is 
much faster than in C.

2) You can always implement time-critical parts (and only these!) in C 
or Fortran and then link this to R for data management and visualization 
-- and you do nothing wrong if you implement your first prototypes in 
pure R.

For ABMs, I use data frames for the individuals, so can use R's 
vectorized facilities in a very natural and seemingly quasi-parallel 
way. Just do computations on the whole data frame or use subset, see
http://www.jstatsoft.org/v22/i09


Examples of dynamic models implemented in R vs. C/Fortran and can be 
found in several conference talks and papers, including an extensive 
discussion of performance issues:


Soetaert, K. Petzoldt, T. & Setzer, R. W. (2010): Solving differential 
equations in R: package deSolve. Journal of Statistical Software 33(9), 
1-25. http://www.jstatsoft.org/v33/i09

Petzoldt T. (2009): Dynamic simulation models - is R powerful enough? 
UseR!2009, July 8-10, Rennes, France
http://www.agrocampus-ouest.fr/math/useR-2009/slides/Petzoldt.pdf

Petzoldt, T. and Soetaert, K. (2011): Using R for Systems Understanding 
- A Dynamic Approach. UseR!2011, August 16-18, University of Warwick, 
Coventry, UK,
http://desolve.r-forge.r-project.org/slides/petz_soet2011.pdf


... and even more about this can be found on:
http://desolve.r-forge.r-project.org/



Thank you for asking this question

Thomas P.