issue with using R parallelization libraries

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20091016/c0008ab7/attachment.pl>
Interesting.  The unfortunate nature of parallel programming is that one
library/hardware platform will not work well for all applications.

I'm about to release to CRAN the official version of my Rdsm
shared-memory parallel package, for which I released an alpha version
(not on CRAN) a couple of months ago.  The new version is much faster
than the old one.  If you are interested, I'd like to test your code on
Rdsm.  Same for others who may have some problematic code.  I'm not
saying Rdsm will be faster (unlikely), but it would be interesting to
see how it does.

Norm Matloff
Looking for advice on parallel techniques for R.

I am presently working with parallelizing R code from a package we use locally for analysis.
The multicore library routines parallel(), and mclapply() give mixed results.
For starters, 2 to 8 cores (1 processor) are available only.

Setting up an interative for(i=1,N) loop with a function ff() where ff() accepts some matrixes and does some number crunching,
parallel() creates as many R threads as N and leaves them around until collect() reads the pids, gets the returned data values and deposits them.  A cleanup of the threads ensues.
mclapply() creates and deletes R threads dynamically only as many as #cores, returning values which continually build inside the new vector.
With large values of N, parallel() causes problems as there are too many threads to manage and it cant pipeline only a few at a time, so all N are running in parallel, instead of #cores or something in between, - if N = 500, the system slows to a where it has to be rebooted basically.  
With large values of N, mclapply() runs to completion ok, but it drags out the amount of system time (from system.time()) needed since it is continually managing thread create/kill/data exchange, so again its not efficient.  Total elapsed time with mclapply() becomes more not less.

Does anyone know how to get around this (without changing a huge amount of R code) ?
I have started looking at snowfall, Rparallel, etc to see if there are better ways of managing the threads, but if anyone on the list has had some experience with this, it would be great to learn how to throttle the thread management or other way of speeding things up.  Thanks greatly.

CONFIDENTIALITY NOTICE:  This email and any attachments ...{{dropped:26}}

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Actually, it would be nice ( only if possible) if you could supply the
simulation code. I would like to test this on some lab machines
Regards
Saptarshi
Interesting. ?The unfortunate nature of parallel programming is that one
library/hardware platform will not work well for all applications.

I'm about to release to CRAN the official version of my Rdsm
shared-memory parallel package, for which I released an alpha version
(not on CRAN) a couple of months ago. ?The new version is much faster
than the old one. ?If you are interested, I'd like to test your code on
Rdsm. ?Same for others who may have some problematic code. ?I'm not
saying Rdsm will be faster (unlikely), but it would be interesting to
see how it does.

Norm Matloff

On Fri, Oct 16, 2009 at 04:47:49PM -0400, Glenn Blanford wrote:
Looking for advice on parallel techniques for R.

I am presently working with parallelizing R code from a package we use locally for analysis.
The multicore library routines parallel(), and mclapply() give mixed results.
For starters, 2 to 8 cores (1 processor) are available only.

Setting up an interative for(i=1,N) loop with a function ff() where ff() accepts some matrixes and does some number crunching,
parallel() creates as many R threads as N and leaves them around until collect() reads the pids, gets the returned data values and deposits them. ?A cleanup of the threads ensues.
mclapply() creates and deletes R threads dynamically only as many as #cores, returning values which continually build inside the new vector.
With large values of N, parallel() causes problems as there are too many threads to manage and it cant pipeline only a few at a time, so all N are running in parallel, instead of #cores or something in between, - if N = 500, the system slows to a where it has to be rebooted basically.
With large values of N, mclapply() runs to completion ok, but it drags out the amount of system time (from system.time()) needed since it is continually managing thread create/kill/data exchange, so again its not efficient. ?Total elapsed time with mclapply() becomes more not less.

Does anyone know how to get around this (without changing a huge amount of R code) ?
I have started looking at snowfall, Rparallel, etc to see if there are better ways of managing the threads, but if anyone on the list has had some experience with this, it would be great to learn how to throttle the thread management or other way of speeding things up. ?Thanks greatly.

CONFIDENTIALITY NOTICE: ?This email and any attachments ...{{dropped:26}}

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc