An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20091016/c0008ab7/attachment.pl>
issue with using R parallelization libraries
3 messages · Glenn Blanford, Norm Matloff, Saptarshi Guha
Interesting. The unfortunate nature of parallel programming is that one library/hardware platform will not work well for all applications. I'm about to release to CRAN the official version of my Rdsm shared-memory parallel package, for which I released an alpha version (not on CRAN) a couple of months ago. The new version is much faster than the old one. If you are interested, I'd like to test your code on Rdsm. Same for others who may have some problematic code. I'm not saying Rdsm will be faster (unlikely), but it would be interesting to see how it does. Norm Matloff
On Fri, Oct 16, 2009 at 04:47:49PM -0400, Glenn Blanford wrote:
Looking for advice on parallel techniques for R.
I am presently working with parallelizing R code from a package we use locally for analysis.
The multicore library routines parallel(), and mclapply() give mixed results.
For starters, 2 to 8 cores (1 processor) are available only.
Setting up an interative for(i=1,N) loop with a function ff() where ff() accepts some matrixes and does some number crunching,
parallel() creates as many R threads as N and leaves them around until collect() reads the pids, gets the returned data values and deposits them. A cleanup of the threads ensues.
mclapply() creates and deletes R threads dynamically only as many as #cores, returning values which continually build inside the new vector.
With large values of N, parallel() causes problems as there are too many threads to manage and it cant pipeline only a few at a time, so all N are running in parallel, instead of #cores or something in between, - if N = 500, the system slows to a where it has to be rebooted basically.
With large values of N, mclapply() runs to completion ok, but it drags out the amount of system time (from system.time()) needed since it is continually managing thread create/kill/data exchange, so again its not efficient. Total elapsed time with mclapply() becomes more not less.
Does anyone know how to get around this (without changing a huge amount of R code) ?
I have started looking at snowfall, Rparallel, etc to see if there are better ways of managing the threads, but if anyone on the list has had some experience with this, it would be great to learn how to throttle the thread management or other way of speeding things up. Thanks greatly.
CONFIDENTIALITY NOTICE: This email and any attachments ...{{dropped:26}}
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Actually, it would be nice ( only if possible) if you could supply the simulation code. I would like to test this on some lab machines Regards Saptarshi
On Fri, Oct 16, 2009 at 5:59 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
Interesting. ?The unfortunate nature of parallel programming is that one library/hardware platform will not work well for all applications. I'm about to release to CRAN the official version of my Rdsm shared-memory parallel package, for which I released an alpha version (not on CRAN) a couple of months ago. ?The new version is much faster than the old one. ?If you are interested, I'd like to test your code on Rdsm. ?Same for others who may have some problematic code. ?I'm not saying Rdsm will be faster (unlikely), but it would be interesting to see how it does. Norm Matloff On Fri, Oct 16, 2009 at 04:47:49PM -0400, Glenn Blanford wrote:
Looking for advice on parallel techniques for R.
I am presently working with parallelizing R code from a package we use locally for analysis.
The multicore library routines parallel(), and mclapply() give mixed results.
For starters, 2 to 8 cores (1 processor) are available only.
Setting up an interative for(i=1,N) loop with a function ff() where ff() accepts some matrixes and does some number crunching,
parallel() creates as many R threads as N and leaves them around until collect() reads the pids, gets the returned data values and deposits them. ?A cleanup of the threads ensues.
mclapply() creates and deletes R threads dynamically only as many as #cores, returning values which continually build inside the new vector.
With large values of N, parallel() causes problems as there are too many threads to manage and it cant pipeline only a few at a time, so all N are running in parallel, instead of #cores or something in between, - if N = 500, the system slows to a where it has to be rebooted basically.
With large values of N, mclapply() runs to completion ok, but it drags out the amount of system time (from system.time()) needed since it is continually managing thread create/kill/data exchange, so again its not efficient. ?Total elapsed time with mclapply() becomes more not less.
Does anyone know how to get around this (without changing a huge amount of R code) ?
I have started looking at snowfall, Rparallel, etc to see if there are better ways of managing the threads, but if anyone on the list has had some experience with this, it would be great to learn how to throttle the thread management or other way of speeding things up. ?Thanks greatly.
CONFIDENTIALITY NOTICE: ?This email and any attachments ...{{dropped:26}}
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc