Why pure computation time in parallel is longer than the serial version?
On Sun, 23 Feb 2014, Xuening Zhu wrote:
Hi Roger: Your explanation is very reasonable and helpful here. Actually the original(default) number of BLAS threads is 2 on my computer. I just turn it to 4 threads by hand later for comparison. There is another question. I'm also confused that in my experiment *mclapply(2 cores)+BLAS(single thread)* is *faster* than *BLAS(2 threads) *and also faster than* mclapply(2 cores)*. That means the combination does make sense here. I know mclapply employs "fork" in POSIX system. But what is the difference? What makes the combination faster than each element of them?
Your alternatives are mclapply(2 cores) with the fast BLAS running sequentially on each core (fastest), fast BLAS running in parallel on 2 cores, and mclapply(2 cores) with standard BLAS running sequentially on each core. For your problem and on your hardware (the size of the level 2 cache affected the size of problem chunks tuned/fast BLAS can compute in parallel before unified cache), mclapply (forking the process) and sequential (single core) fast BLAS are best. For cache see: http://en.wikipedia.org/wiki/CPU_cache#Multi-level_caches It looks as though "modern" processors with unified cache may not be great for numerical work unless the L3 unified cache is relatively large, but I'm just speculating, maybe someone knows? You could change the size of the problem and see if your conclusions change. Read up on the difference between forking and starting new processes (it's among other things about memory). These things do vary from hardware to hardware and task to task. Hope this helps, Roger
Thanks all of you. Have a nice day~ Xuening 2014-02-23 2:33 GMT+08:00 Roger Bivand <Roger.Bivand at nhh.no>:
On Sat, 22 Feb 2014, beleites,claudia wrote: Hi Xuening,
2 physical vs 2 physical * 2 logical threads: See e.g. here: http://unix.stackexchange.com/a/88290 You say you have 2 *physical* cores. That's the number you want to use for the parallel execution. Logical cores are just 2 (or more) threads running on the same physical core. IIRC, this can speed up things mainly if the 2 threads run very different operations.
Yes, this is my experience - I turn off Intel hyperthreads in BIOS to prevent software getting confused. BLAS sees available compute resources, so your BLAS may be installed to see 4 cores, but doesn't know that two are hyperthreads and compete for physical resources. It may be that by limiting BLAS to 2, it gets privileged access to the two real cores, and other OS (or other) tasks running at the same time use the hyperthreads. Roger
--
Xuening Zhu -------------------------------------------------------- Master of Business Statistics Guanghua School of Management, Peking University
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no