An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091121/bed328a2/attachment-0001.pl>
python
15 messages · Jean Legeande, Paul Hiemstra, Barry Rowlingson +6 more
Hi Jean, You can integrate R and Python using RSPython or Rpy. But why would Python be faster than R? Both are interpreted languages and probably about as fast (please someone correct me if I'm wrong). It probably only help if there is a C mcmc implementation linked to python (that you link to R). Isn't there an mcmc package for R that uses a fast implementation of mcmc? See also the Bayesian taskview [1]. cheers, Paul [1] http://cran.r-project.org/web/views/Bayesian.html Jean Legeande schreef:
Dear R users, I would like to make my R code for MCMC faster. It is possible to integrate C code into R but I think C is too complicated for me. I would need a C introduction only for MCMC and I do not know if such a thing exists. I was thinking of Python (and scipy). Where could I read about its integration into R ? How developed are the statistical packages in Python ? I could not find a Python package on the web with functions to simulate Wishart, or multivariate gamma or student distributions. Since I am a little bit lost, I write this message to the R help list. Sorry for these naive questions and thanks for your help. Best, Jean [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Sat, Nov 21, 2009 at 2:29 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
Dear R users, I would like to make my R code for MCMC faster. It is possible to integrate C code into R but I think C is too complicated for me. I would need a C introduction only for MCMC and I do not know if such a thing exists. I was thinking of Python (and scipy). Where could I read about its integration into R ? How developed are the statistical packages in Python ? I could not find a Python package on the web with functions to simulate Wishart, or multivariate gamma or student distributions. Since I am a little bit lost, I write this message to the R help list. Sorry for these naive questions and thanks for your help.
Have you done a profile of your MCMC code to see where the bottleneck is? Without doing that first any effort could be a total waste of time. R can do a lot of it's calculations at the same level as C, so if 80% of your time is spent inverting matrices then converting to Python or C (or even assembly language) isn't going to help much since R's matrix inversion is done using C code (and quite possibly very optimised C code with maybe some assembly language too). So do a profile (see ?Rprof) and work out the bottleneck. It might be one of your functions, in which case just re-writing that in C and linking to R (see programmers guide and a good C book) will do the job. My hunch is that Python and R run at about the same speed, and both use C libraries for speedups (Python primarily via the numpy package). You can call the GSL from Python, and there are probably tricks for getting the distributions you want: http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html describes how to get samples from a Wishart. However using the GSL from Python probably wont be much faster than using R because again it's all at the C level already. Did I suggest you profile your code? Barry
One little thing that I think Barry meant to say. If the bottleneck is in your code, you may be able to improve the situation enough by merely rewriting the R code of your function. If that doesn't work, then you can move to C. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of "The R Inferno" and "A Guide for the Unwilling S User")
Barry Rowlingson wrote:
On Sat, Nov 21, 2009 at 2:29 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
Dear R users, I would like to make my R code for MCMC faster. It is possible to integrate C code into R but I think C is too complicated for me. I would need a C introduction only for MCMC and I do not know if such a thing exists. I was thinking of Python (and scipy). Where could I read about its integration into R ? How developed are the statistical packages in Python ? I could not find a Python package on the web with functions to simulate Wishart, or multivariate gamma or student distributions. Since I am a little bit lost, I write this message to the R help list. Sorry for these naive questions and thanks for your help.
Have you done a profile of your MCMC code to see where the bottleneck is? Without doing that first any effort could be a total waste of time. R can do a lot of it's calculations at the same level as C, so if 80% of your time is spent inverting matrices then converting to Python or C (or even assembly language) isn't going to help much since R's matrix inversion is done using C code (and quite possibly very optimised C code with maybe some assembly language too). So do a profile (see ?Rprof) and work out the bottleneck. It might be one of your functions, in which case just re-writing that in C and linking to R (see programmers guide and a good C book) will do the job. My hunch is that Python and R run at about the same speed, and both use C libraries for speedups (Python primarily via the numpy package). You can call the GSL from Python, and there are probably tricks for getting the distributions you want: http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html describes how to get samples from a Wishart. However using the GSL from Python probably wont be much faster than using R because again it's all at the C level already. Did I suggest you profile your code? Barry
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091121/ba1fccd8/attachment-0001.pl>
We have been using pymc as an alternative to WinBUGS, and have been very pleased with it. I've begun working on an R2Pymc package, but don't have anything ready for sharing yet. Here's the pymc page: http://code.google.com/p/pymc/ and the repo is here: http://github.com/pymc-devs/pymc I've converted a few of the radon examples from Gelman's ARM book to pymc. You can find them here: http://github.com/armstrtw/pymc_radon the original bugs examples are here: http://www.stat.columbia.edu/~gelman/arm/examples/radon/ -Whit
On Sat, Nov 21, 2009 at 1:21 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
Thank you Paul, Barry and Patrick. I will do what you recommand (the profiling). I have heard several times that for example Matlab would be faster than R... This is why I thought of switching to Python, though it is also interpreted. I thought it would be faster. Best, Jean 2009/11/21 Patrick Burns <pburns at pburns.seanet.com>
One little thing that I think Barry meant to say. If the bottleneck is in your code, you may be able to improve the situation enough by merely rewriting the R code of your function. ?If that doesn't work, then you can move to C. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of "The R Inferno" and "A Guide for the Unwilling S User") Barry Rowlingson wrote:
?On Sat, Nov 21, 2009 at 2:29 PM, Jean Legeande <jean.legeande at gmail.com> wrote:
Dear R users, I would like to make my R code for MCMC faster. It is possible to integrate C code into R but I think C is too complicated for me. I would need a C introduction only for MCMC and I do not know if such a thing exists. I was thinking of Python (and scipy). Where could I read about its integration into R ? How developed are the statistical packages in Python ? I could not find a Python package on the web with functions to simulate Wishart, or multivariate gamma or student distributions. Since I am a little bit lost, I write this message to the R help list. Sorry for these naive questions and thanks for your help.
?Have you done a profile of your MCMC code to see where the bottleneck is? Without doing that first any effort could be a total waste of time. ?R can do a lot of it's calculations at the same level as C, so if 80% of your time is spent inverting matrices then converting to Python or C (or even assembly language) isn't going to help much since R's matrix inversion is done using C code (and quite possibly very optimised C code with maybe some assembly language too). ?So do a profile (see ?Rprof) and work out the bottleneck. It might be one of your functions, in which case just re-writing that in C and linking to R (see programmers guide and a good C book) will do the job. ?My hunch is that Python and R run at about the same speed, and both use C libraries for speedups (Python primarily via the numpy package). ?You can call the GSL from Python, and there are probably tricks for getting the distributions you want: http://www.mailinglistarchive.com/help-gsl at gnu.org/msg00096.html ?describes how to get samples from a Wishart. ?However using the GSL from Python probably wont be much faster than using R because again it's all at the C level already. Did I suggest you profile your code? Barry
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091121/83891456/attachment-0001.pl>
My hunch is that Python and R run at about the same speed, and both use C libraries for speedups (Python primarily via the numpy package).
That's not necessarily true. There can be enormous differences between interpreted languages, and R appears to be a particularly slow one (which doesn't usually matter, as well-written code will mostly perform matrix operations). I did run some simple benchmarks with "naive" loops such as this one
for (x in 1:N) {
sum <- sum + x
}
as well as function calls. I haven't tested Python yet, but in generally it is considered to be roughly on par with Perl. Here are results for the loop above: R/simple_count.R 0.82 Mops/s (2000000 ops in 2.43 s) perl/simple_count.perl 8.32 Mops/s (10000000 ops in 1.20 s) (where Mops = million operations per second treats one loop iteration as a single operation here). As you can see, Perl is about 10 times as fast as R. The point is, however, that this difference may not be worth the effort you spend re-implementing your algorithms in Python or Perl and getting the Python/Perl interface for R up and running (I've just about given up on RSPerl, since I simply can't get it to install on my Mac in the way I need it). The difference between R and Perl appears much less important if you compare it to compiled C code: C/simple_count.exe 820.86 Mops/s (500000000 ops in 0.61 s) If you really need speed from an interpreted language, you could try Lua: lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s) (though you're going to lose much of this advantage as soon as you include function calls, which have a lot of overhead in every interpreted language. Hope this helps, Stefan
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091122/4a63e8d1/attachment-0001.pl>
There is work going on on two byte compilers for R: http://www.stat.uiowa.edu/~luke/R/compiler/ http://www.milbo.users.sonic.net/ra You could check whether running under either of those speeds up your R code sufficiently that you don't need to rewrite it.
On Sat, Nov 21, 2009 at 9:29 AM, Jean Legeande <jean.legeande at gmail.com> wrote:
Dear R users, I would like to make my R code for MCMC faster. It is possible to integrate C code into R but I think C is too complicated for me. I would need a C introduction only for MCMC and I do not know if such a thing exists. I was thinking of Python (and scipy). Where could I read about its integration into R ? How developed are the statistical packages in Python ? I could not find a Python package on the web with functions to simulate Wishart, or multivariate gamma or student distributions. Since I am a little bit lost, I write this message to the R help list. Sorry for these naive questions and thanks for your help. Best, Jean ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 11/21/2009 11:32 PM, Stefan Evert wrote:
My hunch is that Python and R run at about the same speed, and both use C libraries for speedups (Python primarily via the numpy package).
That's not necessarily true. There can be enormous differences between interpreted languages, and R appears to be a particularly slow one (which doesn't usually matter, as well-written code will mostly perform matrix operations). I did run some simple benchmarks with "naive" loops such as this one
for (x in 1:N) {
sum <- sum + x
}
Sure, badly written R code does not perform as well as well written python code or C code. On the other hand badly written python code does not perform as well as well written R code. What happens when you try one of these : sum <- sum( 1:N ) sum <- sum( seq_len(N) ) sum <- N * (N + 1L) / 2L # ;-) A lot can be done by just rewriting some of the R code.
as well as function calls. I haven't tested Python yet, but in generally it is considered to be roughly on par with Perl. Here are results for the loop above: R/simple_count.R 0.82 Mops/s (2000000 ops in 2.43 s) perl/simple_count.perl 8.32 Mops/s (10000000 ops in 1.20 s) (where Mops = million operations per second treats one loop iteration as a single operation here). As you can see, Perl is about 10 times as fast as R. The point is, however, that this difference may not be worth the effort you spend re-implementing your algorithms in Python or Perl and getting the Python/Perl interface for R up and running (I've just about given up on RSPerl, since I simply can't get it to install on my Mac in the way I need it). The difference between R and Perl appears much less important if you compare it to compiled C code: C/simple_count.exe 820.86 Mops/s (500000000 ops in 0.61 s) If you really need speed from an interpreted language, you could try Lua: lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s) (though you're going to lose much of this advantage as soon as you include function calls, which have a lot of overhead in every interpreted language. Hope this helps, Stefan
Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/EAD5 : LondonR slides |- http://tr.im/BcPw : celebrating R commit #50000 `- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
Sure, badly written R code does not perform as well as well written python code or C code. On the other hand badly written python code does not perform as well as well written R code. What happens when you try one of these : sum <- sum( 1:N )
R runs out of memory and crashes. :-) I didn't tell you how big N is, did I? But this is exactly the point I was trying to make (but perhaps not prominently enough). In many cases, you can vectorize at least parts of your code or find a more efficient algorithm, which may be faster in R than a brute-force solution in C. But sometimes, you just cannot avoid loops (let's not forget that all the forms of apply() are just loops and don't give much of a speed benefit over a for-loop), function calls, etc.; in this case, performance differences between interpreted languages can matter. Personally, I'd never switch from R to Perl just for speed, though. BTW, I also tried a vectorised algorithm in R, which calculates the sum above in a small number of chunks:
N1 <- 50
N2 <- 1000000
N <- N1 * N2
sum <- 0
for (i in 1:N1) {
x <- as.numeric(i-1) * N2 + 1:N2
sum <- sum + sum(x)
}
which gives R/simple_count_vec.R 31.30 Mops/s (50000000 ops in 1.60 s) So an interpreted loop in Lua is still faster than this partially vectorized code in R:
lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s)
As people on the SQLite mailing list always say: there's no general answer as to which language/implementation/query/... is faster and better. You just have to test the different options for your specific application setting, and be prepared for one or two surprises. Just in case this isn't obvious: If I rewrote matrix multiplication in C and linked this code into R, it would run much slower than if I just typed "A %*% B". All the best, Stefan
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091122/abecf7a5/attachment-0001.pl>
Stefan Evert wrote:
Sure, badly written R code does not perform as well as well written python code or C code. On the other hand badly written python code does not perform as well as well written R code. What happens when you try one of these : sum <- sum( 1:N )
R runs out of memory and crashes. :-) I didn't tell you how big N is, did I?
Really? N <- 1e30 sum( 1:N ) Error in 1:N : result would be too long a vector -Peter Ehlers >
But this is exactly the point I was trying to make (but perhaps not prominently enough). In many cases, you can vectorize at least parts of your code or find a more efficient algorithm, which may be faster in R than a brute-force solution in C. But sometimes, you just cannot avoid loops (let's not forget that all the forms of apply() are just loops and don't give much of a speed benefit over a for-loop), function calls, etc.; in this case, performance differences between interpreted languages can matter. Personally, I'd never switch from R to Perl just for speed, though. BTW, I also tried a vectorised algorithm in R, which calculates the sum above in a small number of chunks:
N1 <- 50
N2 <- 1000000
N <- N1 * N2
sum <- 0
for (i in 1:N1) {
x <- as.numeric(i-1) * N2 + 1:N2
sum <- sum + sum(x)
}
which gives R/simple_count_vec.R 31.30 Mops/s (50000000 ops in 1.60 s) So an interpreted loop in Lua is still faster than this partially vectorized code in R:
lua/simple_count.lua 65.78 Mops/s (100000000 ops in 1.52 s)
As people on the SQLite mailing list always say: there's no general answer as to which language/implementation/query/... is faster and better. You just have to test the different options for your specific application setting, and be prepared for one or two surprises. Just in case this isn't obvious: If I rewrote matrix multiplication in C and linked this code into R, it would run much slower than if I just typed "A %*% B". All the best, Stefan
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091122/ff0c20b1/attachment-0001.pl>