python

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091121/bed328a2/attachment-0001.pl>
Hi Jean,

You can integrate R and Python using RSPython or Rpy. But why would 
Python be faster than R? Both are interpreted languages and probably 
about as fast (please someone correct me if I'm wrong). It probably only 
help if there is a C mcmc implementation linked to python (that you link 
to R). Isn't there an mcmc package for R that uses a fast implementation 
of mcmc? See also the Bayesian taskview [1].

cheers,
Paul

[1] http://cran.r-project.org/web/views/Bayesian.html

Jean Legeande schreef:
Dear R users,

I would like to make my R code for MCMC faster. It is possible to integrate
C code into R but I think C is too complicated for me. I would need a C
introduction only for MCMC and I do not know if such a thing exists.

I was thinking of Python (and scipy). Where could I read about its
integration into R ? How developed are the statistical packages in Python ?
I could not find a Python package on the web with functions to simulate
Wishart, or multivariate gamma or student distributions.

Since I am a little bit lost, I write this message to the R help list. Sorry
for these naive questions and thanks for your help.

Best,
Jean

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dear R users,

I would like to make my R code for MCMC faster. It is possible to integrate
C code into R but I think C is too complicated for me. I would need a C
introduction only for MCMC and I do not know if such a thing exists.

I was thinking of Python (and scipy). Where could I read about its
integration into R ? How developed are the statistical packages in Python ?
I could not find a Python package on the web with functions to simulate
Wishart, or multivariate gamma or student distributions.

Since I am a little bit lost, I write this message to the R help list. Sorry
for these naive questions and thanks for your help.

Have you done a profile of your MCMC code to see where the bottleneck
is? Without doing that first any effort could be a total waste of
time.

 R can do a lot of it's calculations at the same level as C, so if 80%
of your time is spent inverting matrices then converting to Python or
C (or even assembly language) isn't going to help much since R's
matrix inversion is done using C code (and quite possibly very
optimised C code with maybe some assembly language too).

 So do a profile (see ?Rprof) and work out the bottleneck. It might be
one of your functions, in which case just re-writing that in C and
linking to R (see programmers guide and a good C book) will do the
job.

 My hunch is that Python and R run at about the same speed, and both
use C libraries for speedups (Python primarily via the numpy package).

 You can call the GSL from Python, and there are probably tricks for
getting the distributions you want:

http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html

 describes how to get samples from a Wishart.

 However using the GSL from Python probably wont be much faster than
using R because again it's all at the C level already. Did I suggest
you profile your code?

Barry
One little thing that I think Barry
meant to say.

If the bottleneck is in your code, you
may be able to improve the situation
enough by merely rewriting the R code
of your function.  If that doesn't work,
then you can move to C.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of "The R Inferno" and "A Guide for the Unwilling S User")
On Sat, Nov 21, 2009 at 2:29 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
Dear R users,

I would like to make my R code for MCMC faster. It is possible to integrate
C code into R but I think C is too complicated for me. I would need a C
introduction only for MCMC and I do not know if such a thing exists.

I was thinking of Python (and scipy). Where could I read about its
integration into R ? How developed are the statistical packages in Python ?
I could not find a Python package on the web with functions to simulate
Wishart, or multivariate gamma or student distributions.

Since I am a little bit lost, I write this message to the R help list. Sorry
for these naive questions and thanks for your help.

 Have you done a profile of your MCMC code to see where the bottleneck
is? Without doing that first any effort could be a total waste of
time.

 R can do a lot of it's calculations at the same level as C, so if 80%
of your time is spent inverting matrices then converting to Python or
C (or even assembly language) isn't going to help much since R's
matrix inversion is done using C code (and quite possibly very
optimised C code with maybe some assembly language too).

 So do a profile (see ?Rprof) and work out the bottleneck. It might be
one of your functions, in which case just re-writing that in C and
linking to R (see programmers guide and a good C book) will do the
job.

 My hunch is that Python and R run at about the same speed, and both
use C libraries for speedups (Python primarily via the numpy package).

 You can call the GSL from Python, and there are probably tricks for
getting the distributions you want:

http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html

 describes how to get samples from a Wishart.

 However using the GSL from Python probably wont be much faster than
using R because again it's all at the C level already. Did I suggest
you profile your code?

Barry

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091121/ba1fccd8/attachment-0001.pl>
We have been using pymc as an alternative to WinBUGS, and have been
very pleased with it.  I've begun working on an R2Pymc package, but
don't have anything ready for sharing yet.

Here's the pymc page:
http://code.google.com/p/pymc/

and the repo is here:
http://github.com/pymc-devs/pymc

I've converted a few of the radon examples from Gelman's ARM book to
pymc.  You can find them here:
http://github.com/armstrtw/pymc_radon

the original bugs examples are here:
http://www.stat.columbia.edu/~gelman/arm/examples/radon/

-Whit
Thank you Paul, Barry and Patrick.

I will do what you recommand (the profiling).

I have heard several times that for example Matlab would be faster than R...
This is why I thought of switching to Python, though it is also interpreted.
I thought it would be faster.

Best,
Jean

2009/11/21 Patrick Burns <pburns at pburns.seanet.com>

One little thing that I think Barry
meant to say.

If the bottleneck is in your code, you
may be able to improve the situation
enough by merely rewriting the R code
of your function. ?If that doesn't work,
then you can move to C.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of "The R Inferno" and "A Guide for the Unwilling S User")

Barry Rowlingson wrote:

?On Sat, Nov 21, 2009 at 2:29 PM, Jean Legeande <jean.legeande at gmail.com>
wrote:

Dear R users,

I would like to make my R code for MCMC faster. It is possible to
integrate
C code into R but I think C is too complicated for me. I would need a C
introduction only for MCMC and I do not know if such a thing exists.

I was thinking of Python (and scipy). Where could I read about its
integration into R ? How developed are the statistical packages in Python
?
I could not find a Python package on the web with functions to simulate
Wishart, or multivariate gamma or student distributions.

Since I am a little bit lost, I write this message to the R help list.
Sorry
for these naive questions and thanks for your help.

?Have you done a profile of your MCMC code to see where the bottleneck
is? Without doing that first any effort could be a total waste of
time.

?R can do a lot of it's calculations at the same level as C, so if 80%
of your time is spent inverting matrices then converting to Python or
C (or even assembly language) isn't going to help much since R's
matrix inversion is done using C code (and quite possibly very
optimised C code with maybe some assembly language too).

?So do a profile (see ?Rprof) and work out the bottleneck. It might be
one of your functions, in which case just re-writing that in C and
linking to R (see programmers guide and a good C book) will do the
job.

?My hunch is that Python and R run at about the same speed, and both
use C libraries for speedups (Python primarily via the numpy package).

?You can call the GSL from Python, and there are probably tricks for
getting the distributions you want:

http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html

?describes how to get samples from a Wishart.

?However using the GSL from Python probably wont be much faster than
using R because again it's all at the C level already. Did I suggest
you profile your code?

Barry

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091121/83891456/attachment-0001.pl>
My hunch is that Python and R run at about the same speed, and both
use C libraries for speedups (Python primarily via the numpy package).
That's not necessarily true.  There can be enormous differences  
between interpreted languages, and R appears to be a particularly slow  
one (which doesn't usually matter, as well-written code will mostly  
perform matrix operations).

I did run some simple benchmarks with "naive" loops such as this one
for (x in 1:N) {
        sum <- sum + x
}
as well as function calls.  I haven't tested Python yet, but in  
generally it is considered to be roughly on par with Perl.

Here are results for the loop above:

R/simple_count.R                   0.82 Mops/s  (2000000 ops in 2.43 s)
perl/simple_count.perl             8.32 Mops/s  (10000000 ops in 1.20 s)

(where Mops = million operations per second treats one loop iteration  
as a single operation here).  As you can see, Perl is about 10 times  
as fast as R.  The point is, however, that this difference may not be  
worth the effort you spend re-implementing your algorithms in Python  
or Perl and getting the Python/Perl interface for R up and running  
(I've just about given up on RSPerl, since I simply can't get it to  
install on my Mac in the way I need it).

The difference between R and Perl appears much less important if you  
compare it to compiled C code:

C/simple_count.exe               820.86 Mops/s  (500000000 ops in 0.61  
s)

If you really need speed from an interpreted language, you could try  
Lua:

lua/simple_count.lua              65.78 Mops/s  (100000000 ops in 1.52  
s)

(though you're going to lose much of this advantage as soon as you  
include function calls, which have a lot of overhead in every  
interpreted language.

Hope this helps,
Stefan
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091122/4a63e8d1/attachment-0001.pl>
There is work going on on two byte compilers for R:

http://www.stat.uiowa.edu/~luke/R/compiler/

http://www.milbo.users.sonic.net/ra

You could check whether running under either of those speeds up your R
code sufficiently that you don't need to rewrite it.
Dear R users,

I would like to make my R code for MCMC faster. It is possible to integrate
C code into R but I think C is too complicated for me. I would need a C
introduction only for MCMC and I do not know if such a thing exists.

I was thinking of Python (and scipy). Where could I read about its
integration into R ? How developed are the statistical packages in Python ?
I could not find a Python package on the web with functions to simulate
Wishart, or multivariate gamma or student distributions.

Since I am a little bit lost, I write this message to the R help list. Sorry
for these naive questions and thanks for your help.

Best,
Jean

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

My hunch is that Python and R run at about the same speed, and both
use C libraries for speedups (Python primarily via the numpy package).
That's not necessarily true. There can be enormous differences between
interpreted languages, and R appears to be a particularly slow one
(which doesn't usually matter, as well-written code will mostly perform
matrix operations).

I did run some simple benchmarks with "naive" loops such as this one

for (x in 1:N) {
sum <- sum + x
}
Sure, badly written R code does not perform as well as well written 
python code or C code. On the other hand badly written python code does 
not perform as well as well written R code.

What happens when you try one of these :

sum <- sum( 1:N )
sum <- sum( seq_len(N) )
sum <- N * (N + 1L) / 2L  # ;-)

A lot can be done by just rewriting some of the R code.
as well as function calls. I haven't tested Python yet, but in generally
it is considered to be roughly on par with Perl.

Here are results for the loop above:

R/simple_count.R 0.82 Mops/s (2000000 ops in 2.43 s)
perl/simple_count.perl 8.32 Mops/s (10000000 ops in 1.20 s)

(where Mops = million operations per second treats one loop iteration as
a single operation here). As you can see, Perl is about 10 times as fast
as R. The point is, however, that this difference may not be worth the
effort you spend re-implementing your algorithms in Python or Perl and
getting the Python/Perl interface for R up and running (I've just about
given up on RSPerl, since I simply can't get it to install on my Mac in
the way I need it).

The difference between R and Perl appears much less important if you
compare it to compiled C code:

C/simple_count.exe 820.86 Mops/s (500000000 ops in 0.61 s)

If you really need speed from an interpreted language, you could try Lua:

lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s)

(though you're going to lose much of this advantage as soon as you
include function calls, which have a lot of overhead in every
interpreted language.

Hope this helps,
Stefan

Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/EAD5 : LondonR slides
|- http://tr.im/BcPw : celebrating R commit #50000
`- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
Sure, badly written R code does not perform as well as well written  
python code or C code. On the other hand badly written python code  
does not perform as well as well written R code.

What happens when you try one of these :

sum <- sum( 1:N )
R runs out of memory and crashes. :-)  I didn't tell you how big N is,  
did I?

But this is exactly the point I was trying to make (but perhaps not  
prominently enough).  In many cases, you can vectorize at least parts  
of your code or find a more efficient algorithm, which may be faster  
in R than a brute-force solution in C.  But sometimes, you just cannot  
avoid loops (let's not forget that all the forms of apply() are just  
loops and don't give much of a speed benefit over a for-loop),  
function calls, etc.; in this case, performance differences between  
interpreted languages can matter.

Personally, I'd never switch from R to Perl just for speed, though.

BTW, I also tried a vectorised algorithm in R, which calculates the  
sum above in a small number of chunks:
N1 <- 50
N2 <- 1000000
N <- N1 * N2
sum <- 0

for (i in 1:N1) {
        x <- as.numeric(i-1) * N2 + 1:N2
        sum <- sum + sum(x)
}
which gives

R/simple_count_vec.R              31.30 Mops/s  (50000000 ops in 1.60 s)

So an interpreted loop in Lua is still faster than this partially  
vectorized code in R:
lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s)
As people on the SQLite mailing list always say: there's no general  
answer as to which language/implementation/query/... is faster and  
better.  You just have to test the different options for your specific  
application setting, and be prepared for one or two surprises.

Just in case this isn't obvious: If I rewrote matrix multiplication in  
C and linked this code into R, it would run much slower than if I just  
typed "A %*% B".

All the best,
Stefan
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091122/abecf7a5/attachment-0001.pl>
Sure, badly written R code does not perform as well as well written 
python code or C code. On the other hand badly written python code 
does not perform as well as well written R code.

What happens when you try one of these :

sum <- sum( 1:N )
R runs out of memory and crashes. :-)  I didn't tell you how big N is, 
did I?
Really?

  N <- 1e30
  sum( 1:N )
  Error in 1:N : result would be too long a vector

  -Peter Ehlers

 >
But this is exactly the point I was trying to make (but perhaps not 
prominently enough).  In many cases, you can vectorize at least parts of 
your code or find a more efficient algorithm, which may be faster in R 
than a brute-force solution in C.  But sometimes, you just cannot avoid 
loops (let's not forget that all the forms of apply() are just loops and 
don't give much of a speed benefit over a for-loop), function calls, 
etc.; in this case, performance differences between interpreted 
languages can matter.

Personally, I'd never switch from R to Perl just for speed, though.

BTW, I also tried a vectorised algorithm in R, which calculates the sum 
above in a small number of chunks:

N1 <- 50
N2 <- 1000000
N <- N1 * N2
sum <- 0

for (i in 1:N1) {
        x <- as.numeric(i-1) * N2 + 1:N2
        sum <- sum + sum(x)
}
which gives

R/simple_count_vec.R              31.30 Mops/s  (50000000 ops in 1.60 s)

So an interpreted loop in Lua is still faster than this partially 
vectorized code in R:

lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s)
As people on the SQLite mailing list always say: there's no general 
answer as to which language/implementation/query/... is faster and 
better.  You just have to test the different options for your specific 
application setting, and be prepared for one or two surprises.

Just in case this isn't obvious: If I rewrote matrix multiplication in C 
and linked this code into R, it would run much slower than if I just 
typed "A %*% B".

All the best,
Stefan

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091122/ff0c20b1/attachment-0001.pl>