multithreading calling from the rpy Python package

Since Python has been mentioned in this context: Could not Python's
threading model and implementation serve as a guideline?
From a few simple benchmarks I've run, it seems as if the Python
interpreter itself is thread-safe but not threadable. That is, when I
run something "pure Python" like a recursive function that returns the
nth Fibonacci number in parallel, there is no speed-up for 2 threads
on a dual-processor machine. However, calling sleep in parallel does
scale down with the number of threads, even on a single-processor ;)

Real-life code does tend to speed up somewhat, though never as much as
one would hope.

Just an idea...

Ren?
There are several sets of notes on threading off
http://developer.r-project.org page--somewhat old but still relevant.
The Python approach is discussed there.  That approach, which gives
concurrency but not parallelism, is in principle fleasible for R but
getting from here to there is non-trivial given that we have some
unique issues related to FORTRAN semantics as well as how many R
packages are written.  It may happen yet, but probably later rather
than sooner.

Best,

luke

Since Python has been mentioned in this context: Could not Python's
threading model and implementation serve as a guideline?

From a few simple benchmarks I've run, it seems as if the Python
interpreter itself is thread-safe but not threadable. That is, when I
run something "pure Python" like a recursive function that returns the
nth Fibonacci number in parallel, there is no speed-up for 2 threads
on a dual-processor machine. However, calling sleep in parallel does
scale down with the number of threads, even on a single-processor ;)

Real-life code does tend to speed up somewhat, though never as much as
one would hope.

Just an idea...

Ren?

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
There are several sets of notes on threading off
http://developer.r-project.org page--somewhat old but still relevant.
The Python approach is discussed there.  That approach, which gives
concurrency but not parallelism, is in principle fleasible for R but
getting from here to there is non-trivial given that we have some
unique issues related to FORTRAN semantics as well as how many R
packages are written.  It may happen yet, but probably later rather
than sooner.

To follow up on Luke's comments..

We can partially automate the work to get multiple evaluators in R.
And we are getting very close to having the underlying tools to do this.
But it still remains to be seen whether the extra work to introduce
threads is warranted. Will people actually use them in R and will it
have a significant impact on the computations or simply make writing
GUIs within R slightly easier to manage?
The design of R may not be ideal for high performance computing and
a new architecture and system explicitly for more specialized
computation in the short term may be warranted. Some of us are thinking
about this from a variety of different perspectives.  Lee Edlefsen h
has some intersting work at exametrix.com

One of the reasons I am hesitant to use Python as a framework on
which to build a new system is the "thread-safe but not threadable"
issue. Also, it is not easily extensible in an object oriented manner
and this is a big issue for evolvability and user extensions to a system.
Best,

luke

On Fri, 20 Oct 2006, Ren? J.V. Bertin wrote:

Since Python has been mentioned in this context: Could not Python's
threading model and implementation serve as a guideline?

From a few simple benchmarks I've run, it seems as if the Python
interpreter itself is thread-safe but not threadable. That is, when I
run something "pure Python" like a recursive function that returns the
nth Fibonacci number in parallel, there is no speed-up for 2 threads
on a dual-processor machine. However, calling sleep in parallel does
scale down with the number of threads, even on a single-processor ;)

Real-life code does tend to speed up somewhat, though never as much as
one would hope.

Just an idea...

Ren?

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

- --
Duncan Temple Lang                    duncan at wald.ucdavis.edu
Department of Statistics              work:  (530) 752-4782
4210 Mathematical Sciences Building   fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis,
CA 95616,
USA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFFOQBY9p/Jzwa2QP4RAoD9AJwOxnuLKp+1pkregjsmgi0XQczqDgCbBFsB
eR4bQXKHNUBk6RILk0kxVg4=
=1YKW
-----END PGP SIGNATURE-----
But it still remains to be seen whether the extra work to introduce
threads is warranted. Will people actually use them in R and will it
have a significant impact on the computations or simply make writing
GUIs within R slightly easier to manage?
If threads can be set up easily, why not? Now that multi-core
machines are becoming more easily available...

It is not just about reducing computation time, btw. Not so long ago,
I was setting up a system in Matlab to do concurrent sampling of a DAQ
and an eye-tracker, and to show and record the sampled data. The DAQ
toolbox fires off its own thread that does the actual sampling and can
be configured to call a Matlab callback function at a predetermined
interval.
The eye-tracker code is single-threaded. If Matlab had been
threadable, I'd have been able to sample the tracker in a different
thread, and not miss out on the data coming in while plotting.
One of the reasons I am hesitant to use Python as a framework on
which to build a new system is the "thread-safe but not threadable"
issue. Also, it is not easily extensible in an object oriented manner
Well, I didn't mean to suggest that it would the perfect solution.
It seemed like a potentially worthwhile, feasible temporary solution
that would allow at least some multithreading. I don't see how it is
not easily extensible in an OO manner, though. The Python threads I
use *are* objects (and very similar apparently to Java's threading
model).

Best,
Ren?

But it still remains to be seen whether the extra work to introduce
threads is warranted. Will people actually use them in R and will it
have a significant impact on the computations or simply make writing
GUIs within R slightly easier to manage?
 If threads can be set up easily, why not? Now that multi-core
machines are becoming more easily available...
If it was easy it would have been done along time ago.  And what does
multi-core have to do with Python style threading? Nothing of course
...
It is not just about reducing computation time, btw. Not so long ago,
I was setting up a system in Matlab to do concurrent sampling of a DAQ
and an eye-tracker, and to show and record the sampled data. The DAQ
toolbox fires off its own thread that does the actual sampling and can
be configured to call a Matlab callback function at a predetermined
interval.
The eye-tracker code is single-threaded. If Matlab had been
threadable, I'd have been able to sample the tracker in a different
thread, and not miss out on the data coming in while plotting.
Yes, concurrent threading as available in Pyton is useful.  Is is
useful enough to justify the effort of those who are going to end up
doing the work (given that there are other things we could also be
working on)?  That is not clear.

Best,

luke
One of the reasons I am hesitant to use Python as a framework on
which to build a new system is the "thread-safe but not threadable"
issue. Also, it is not easily extensible in an object oriented manner
 Well, I didn't mean to suggest that it would the perfect solution.
It seemed like a potentially worthwhile, feasible temporary solution
that would allow at least some multithreading. I don't see how it is
not easily extensible in an OO manner, though. The Python threads I
use *are* objects (and very similar apparently to Java's threading
model).

Best,
Ren?

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
Since Python has been mentioned in this context: Could not Python's
threading model and implementation serve as a guideline?
Why would you want to do that?  Does that model have some particular
synergy with R's design or current limitations?  (From later posts, I
suspect that the answer to that is "yes".)

I'm curious, what is the current state of the R implementation's
support for thread-safety, re-entrant code, and concurrency?  What
issues in the code block improvement?  Is there a summary anywhere
more recent than the the info here?:

  http://developer.r-project.org/RThreads/index.html

And perhaps most importantly, what are the main users or use cases
driving (or likely to drive in the future) improved multi-threading
support?  Who really wants it, and why?

It would certainly be nice to make R completely thread-safe (and
ideally, fully reentrant), so that it could be easily embedded into a
multi-threaded program.  But, that sounds like a non-mainstream use of
R, and thus unlikely to motivate most of the R core developers.

R also already has about 8 different packages for doing parallel
programming across multiple machines using MPI, PVM, or etc.  I wonder
to what degree these obviate the need for multi-threaded R.
(Certainly not 100%, as most of those approaches will have vastly
worse latency than multi-threading.  But I wonder how much of the same
use cases they cover, 20%, or 80%?)

There's also the broader question of how or whether multi-threading
support would help, hinder, or otherwise interact with other potential
cool changes to R.  E.g., a high-performance byte-code interpreter, or
Lisp-like macros.

Any large single threaded application like R is going to have its own
particular obstacles to making the code thread-safe, and more to
making it reentrant and adding good multi-threading support of one
flavor or another.

I don't know what the particular obstacles for R are.  But if I was
looking for a GENERAL example of implementing a multi-threaded
interpreter, I certainly wouldn't choose Python.

Neither Python nor Ruby have real multi-threading, and are only
thread-safe via some sort of global lock.  (Only one thread can ever
run at a time, no matter how many CPUs your machine has.)  Perl
probably does support real multi-threading, but people say it hasn't
had much real world use.  Tcl has excellent multi-threading support,
using independent interpreters on top of OS threads (POSIX or Win32),
and has been in heavy use for many years.  I don't know about Lua,
JavaScript, the eleventy dozen different Scheme implementations,
etc. etc.

There are lots of different models for threading and concurrency.
OS-threads vs. user-mode threads is just one variable in the choice
space.  There's also message passing vs. shared memory, threads
vs. event-based programming, default shared-everything vs. default
shared-nothing, etc. etc.

In my experience, Tcl's "apartment model" for multi-threading is quite
nice to work with.  With it, you program each Tcl interpreter as if it
was a stand-alone shared-nothing process, which communicates with
other Tcl interpreters only through explicit shared memory and message
passing APIs.  Underneath, the C code sees the true shared-everything
threading implementation, but also has APIs to make working with it
easier.

(Note that Tcl implements that on top of POSIX and Win32 threads, but
you could implement the exact same script-level model on top of
inter-process shared memory.  Only the C code underneath would see any
difference.)

I have never really used any other threading model extensively, so I
can't do a proper hands-on compare and contrast.  However, my
suspicion is that "shared nothing by default" models (like Tcl's
threading) are the better way to go, rather than "shared everything by
default" (like POSIX threads).  The Erlang and Mozart/Oz folks both
seem to think so, etc.
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/
I don't want to keep hammering on the possible interest of python in
this context... but have you seen this?

http://ipython.scipy.org/moin/Parallel_Computing

I know, not exactly the same as multithreading ;)

I don't want to keep hammering on the possible interest of python in
this context... but have you seen this?

http://ipython.scipy.org/moin/Parallel_Computing

I know, not exactly the same as multithreading ;)

Um - have you seen this?
http://cran.r-project.org/src/contrib/Descriptions/snow.html
It allows parallel computing in R - and has support for more backends  
than the ipython stuff ;).

On a more serious note I can only repeat what have been said before -  
AFAIK the Python concept is too limited to be really of significant  
use in R and real threaded R is not on the horizon anytime soon.  
However, approaches like snow were suitable for most parallelization  
tasks I have encountered in R... So unless someone writes a lot of  
[the right] code (which would be great!) it's unlikely to change.

Cheers,
Simon