Skip to content

python

15 messages · Jean Legeande, Paul Hiemstra, Barry Rowlingson +6 more

#
Hi Jean,

You can integrate R and Python using RSPython or Rpy. But why would 
Python be faster than R? Both are interpreted languages and probably 
about as fast (please someone correct me if I'm wrong). It probably only 
help if there is a C mcmc implementation linked to python (that you link 
to R). Isn't there an mcmc package for R that uses a fast implementation 
of mcmc? See also the Bayesian taskview [1].

cheers,
Paul

[1] http://cran.r-project.org/web/views/Bayesian.html

Jean Legeande schreef:
#
On Sat, Nov 21, 2009 at 2:29 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
Have you done a profile of your MCMC code to see where the bottleneck
is? Without doing that first any effort could be a total waste of
time.

 R can do a lot of it's calculations at the same level as C, so if 80%
of your time is spent inverting matrices then converting to Python or
C (or even assembly language) isn't going to help much since R's
matrix inversion is done using C code (and quite possibly very
optimised C code with maybe some assembly language too).

 So do a profile (see ?Rprof) and work out the bottleneck. It might be
one of your functions, in which case just re-writing that in C and
linking to R (see programmers guide and a good C book) will do the
job.

 My hunch is that Python and R run at about the same speed, and both
use C libraries for speedups (Python primarily via the numpy package).

 You can call the GSL from Python, and there are probably tricks for
getting the distributions you want:

http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html

 describes how to get samples from a Wishart.

 However using the GSL from Python probably wont be much faster than
using R because again it's all at the C level already. Did I suggest
you profile your code?

Barry
#
One little thing that I think Barry
meant to say.

If the bottleneck is in your code, you
may be able to improve the situation
enough by merely rewriting the R code
of your function.  If that doesn't work,
then you can move to C.



Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of "The R Inferno" and "A Guide for the Unwilling S User")
Barry Rowlingson wrote:
#
We have been using pymc as an alternative to WinBUGS, and have been
very pleased with it.  I've begun working on an R2Pymc package, but
don't have anything ready for sharing yet.

Here's the pymc page:
http://code.google.com/p/pymc/

and the repo is here:
http://github.com/pymc-devs/pymc

I've converted a few of the radon examples from Gelman's ARM book to
pymc.  You can find them here:
http://github.com/armstrtw/pymc_radon

the original bugs examples are here:
http://www.stat.columbia.edu/~gelman/arm/examples/radon/

-Whit
On Sat, Nov 21, 2009 at 1:21 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
#
That's not necessarily true.  There can be enormous differences  
between interpreted languages, and R appears to be a particularly slow  
one (which doesn't usually matter, as well-written code will mostly  
perform matrix operations).

I did run some simple benchmarks with "naive" loops such as this one
as well as function calls.  I haven't tested Python yet, but in  
generally it is considered to be roughly on par with Perl.

Here are results for the loop above:

R/simple_count.R                   0.82 Mops/s  (2000000 ops in 2.43 s)
perl/simple_count.perl             8.32 Mops/s  (10000000 ops in 1.20 s)

(where Mops = million operations per second treats one loop iteration  
as a single operation here).  As you can see, Perl is about 10 times  
as fast as R.  The point is, however, that this difference may not be  
worth the effort you spend re-implementing your algorithms in Python  
or Perl and getting the Python/Perl interface for R up and running  
(I've just about given up on RSPerl, since I simply can't get it to  
install on my Mac in the way I need it).

The difference between R and Perl appears much less important if you  
compare it to compiled C code:

C/simple_count.exe               820.86 Mops/s  (500000000 ops in 0.61  
s)

If you really need speed from an interpreted language, you could try  
Lua:

lua/simple_count.lua              65.78 Mops/s  (100000000 ops in 1.52  
s)

(though you're going to lose much of this advantage as soon as you  
include function calls, which have a lot of overhead in every  
interpreted language.


Hope this helps,
Stefan
#
There is work going on on two byte compilers for R:

http://www.stat.uiowa.edu/~luke/R/compiler/

http://www.milbo.users.sonic.net/ra

You could check whether running under either of those speeds up your R
code sufficiently that you don't need to rewrite it.
On Sat, Nov 21, 2009 at 9:29 AM, Jean Legeande <jean.legeande at gmail.com> wrote:
#
On 11/21/2009 11:32 PM, Stefan Evert wrote:
Sure, badly written R code does not perform as well as well written 
python code or C code. On the other hand badly written python code does 
not perform as well as well written R code.

What happens when you try one of these :

sum <- sum( 1:N )
sum <- sum( seq_len(N) )
sum <- N * (N + 1L) / 2L  # ;-)

A lot can be done by just rewriting some of the R code.

  
    
#
R runs out of memory and crashes. :-)  I didn't tell you how big N is,  
did I?

But this is exactly the point I was trying to make (but perhaps not  
prominently enough).  In many cases, you can vectorize at least parts  
of your code or find a more efficient algorithm, which may be faster  
in R than a brute-force solution in C.  But sometimes, you just cannot  
avoid loops (let's not forget that all the forms of apply() are just  
loops and don't give much of a speed benefit over a for-loop),  
function calls, etc.; in this case, performance differences between  
interpreted languages can matter.

Personally, I'd never switch from R to Perl just for speed, though.

BTW, I also tried a vectorised algorithm in R, which calculates the  
sum above in a small number of chunks:
which gives

R/simple_count_vec.R              31.30 Mops/s  (50000000 ops in 1.60 s)

So an interpreted loop in Lua is still faster than this partially  
vectorized code in R:
As people on the SQLite mailing list always say: there's no general  
answer as to which language/implementation/query/... is faster and  
better.  You just have to test the different options for your specific  
application setting, and be prepared for one or two surprises.

Just in case this isn't obvious: If I rewrote matrix multiplication in  
C and linked this code into R, it would run much slower than if I just  
typed "A %*% B".

All the best,
Stefan
#
Stefan Evert wrote:
Really?

  N <- 1e30
  sum( 1:N )
  Error in 1:N : result would be too long a vector

  -Peter Ehlers

 >