Skip to content

Python and R

19 messages · Esmail Bonakdarian, Warren Young, Barry Rowlingson +4 more

#
Hello all,

I am just wondering if any of you are doing most of your scripting
with Python instead of R's programming language and then calling
the relevant R functions as needed?

And if so, what is your experience with this and what sort of
software/library  do you use in combination with Python to be able
to access R's functionality.

Is there much of a performance hit either way? (as both are interpreted
languages)

Thanks,
Esmail
#
Esmail Bonakdarian wrote:
No, but if I wanted to do such a thing, I'd look at Sage: 
http://sagemath.org/

It'll give you access to a lot more than just R.
Are you just asking, or do you have a particular execution time goal, 
which if exceeded would prevent doing this?  I ask because I suspect 
it's the former, and fast enough is fast enough.
#
2009/2/17 Esmail Bonakdarian <esmail.js at gmail.com>:
I tend to use R in its native form for data analysis and modelling,
and python for all my other programming needs (gui stuff with PyQt4,
web stuff, text processing etc etc).
When I need to use the two together, it's easiest with 'rpy'. This
lets you call R functions from python, so you can do:

 from rpy import r
 r.hist(z)

to get a histogram of the values in a python list 'z'. There are some
complications converting structured data types between the two but
they can be overcome, and apparently are handled better with the next
generation Rpy2 (which I've not got into yet). Google for rpy for
info.
Not sure what you mean here. Do you mean is:

 R> sum(x)

faster than

Python> sum(x)

and how much worse is:

Python> from rpy import r
Python> r.sum(x)

?

 Knuth's remark on premature optimization applies, as ever....

Barry
#
Hello!
On Tue, Feb 17, 2009 at 5:58 PM, Warren Young <warren at etr-usa.com> wrote:
ah .. thanks for the pointer, I had not heard of Sage, I was just
starting to look at
SciPy.
I put together a large'ish R program last year, but I think I would be
happier if I could code
it in say Python - but I would rather not do that at the expense of
execution time.

Thanks again for telling me about Sage.

Esmail
#
On Tue, Feb 17, 2009 at 6:05 PM, Barry Rowlingson
<b.rowlingson at lancaster.ac.uk> wrote:
wow .. that is pretty straight forward, I'll have to check out rpy for sure.
Will do!
Well, I have a program written in R which already takes quite a while
to run. I was
just wondering if I were to rewrite most of the logic in Python - the
main thing I use
in R are its regression facilities - if it would speed things up. I
suspect not since
both of them are interpreted, and the bulk of the time is taken up by
R's regression
calls.

Esmail
#
On Tue, Feb 17, 2009 at 6:59 PM, Esmail Bonakdarian <esmail.js at gmail.com> wrote:
See ?Rprof for profiling your R code.

If lm is the culprit, rewriting your lm calls using lm.fit might help.
#
2009/2/17 Esmail Bonakdarian <esmail.js at gmail.com>:
- and the bulk of the time in the regression calls will be taken up
by C code in the underlying linear algebra libraries (lapack, blas,
atlas and friends).

 Your best bet for optimisation in this case would be making sure you
have the best libraries for your architecture. That's a bit beyond me
at the moment, others here can probably tell you about getting the
best performing library for your system.

 This can also speed up Python (scipy or numpy) code that uses the
same libraries.

Barry
#
Gabor Grothendieck wrote:
Yes, based on my informal benchmarking, lm is the main "bottleneck", the rest
of the code consists mostly of vector manipulations and control structures.

I am not familiar with lm.fit, I'll definitely look it up. I hope it's similar
enough to make it easy to substitute one for the other.

Thanks for the suggestion, much appreciated. (My runs now take sometimes
several hours, it would be great to cut that time down by any amount :-)

Esmail
#
Barry Rowlingson wrote:
ah, good point.
thanks for the suggestions Barry, I mostly run on intel machines, but
using two flavors of Linux and also Windows XP - I grab any machine I can to
help run this. R versions range from 2.6.x (Fedora) to 2.8.1 (XP) at the
moment.

Another post suggested I look at lm.fit in place of lm to help speed things
up, so I'm going to look at that next.

Appreciate all the helpful posts here.

Esmail
#
On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian <esmail.js at gmail.com> wrote:
Yes, the speedup can be significant.  e.g. here we cut the time down to
40% of the lm time by using lm.fit and we can get down to nearly 10% if
we go even lower level:
user  system elapsed
  26.85    0.07   27.35
user  system elapsed
  10.76    0.00   10.78
user  system elapsed
   3.33    0.00    3.34
Call:
lm(formula = DAX ~ . - 1, data = EuStockMarkets)

Coefficients:
     SMI       CAC      FTSE
 0.55156   0.45062  -0.09392
SMI         CAC        FTSE
 0.55156141  0.45062183 -0.09391815
SMI         CAC        FTSE
 0.55156141  0.45062183 -0.09391815
#
You could do

crossprod(x,y) instead of t(x))%*%y
#
Gabor Grothendieck wrote:
Wow those numbers look impressive, that would be a nice speedup to have.

I took a look at the manual and found the following at the top of
the description for lm.fit:

   "These are the basic computing engines called by lm used to fit linear
    models. These should usually not be used directly unless by experienced
    users. "

I am certainly not an experienced user - so I wonder how different it
would be to use lm.fit instead of lm.

Right now I cobble together an equation and then call lm with it and the
datafile.

I.e.,

     LM.1 = lm(as.formula(eqn), data=datafile)
     s=summary(LM.1)

I then extract some information from the summary stats.

I'm not really quite sure what to make of the parameter list in lm.fit

I will look on-line and see if I can find an example showing the use of
this - thanks for pointing me in that direction.

Esmail
#
Hi Kenn,

Thanks for the suggestions, I'll have to see if I can figure out how to
convert the relatively simple call to lm with an equation and the data file
to the functions you mention (or if that's even feasible).

Not an expert in statistics myself, I am mostly concentrating on the
programming aspects of R. Problem is that I suspect my colleagues who
are providing some guidance with the stats end are not quite experts
themselves, and certainly new to R.

Cheers,

Esmail
Kenn Konstabel wrote:
#
Doran, Harold wrote:
that certainly looks more readable (and less error prone) to an R newbie
like myself :-)
#
On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian <esmail.js at gmail.com> wrote:
X <- model.matrix(formula, data)

will calculate the X matrix for you.
#
Note that using solve can be numerically unstable for certain problems.
On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel <lebatsnok at gmail.com> wrote:
1 day later
#
Different methods of performing least squares calculations in R are discussed in

@Article{Rnews:Bates:2004,
  author       = {Douglas Bates},
  title        = {Least Squares Calculations in {R}},
  journal      = {R News},
  year         = 2004,
  volume       = 4,
  number       = 1,
  pages        = {17--20},
  month        = {June},
  url          = http,
  pdf          = Rnews2004-1
}

Some of the functions mentioned in that article have been modified.  A
more up-to-date version of the comparisons in that article is
available as the Comparisons vignette in the Matrix package.

On Fri, Feb 20, 2009 at 6:06 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote: