Grand Central Dispatch (simple loop optimization)

Jan,

on my system (2 x 2.93 quad core Nehalem
with hyper-threading, so 16 threads max, 16GB RAM,
10.6.1, 64bit kernel, 64bit R)

system.time(threads(100000,1000,"omp"))
  user  system elapsed
10.249   0.009   0.662
system.time(threads(100000,1000,"gcd"))
  user  system elapsed
10.208   0.008   0.668
system.time(threads(100000,1000,"dcg"))
  user  system elapsed
 8.731   0.005   8.738

so omp == gcd, but for more complicated tasks the
tighter integration may favor gcd

comparing harpertown and nehalem --> surprising
difference (kernel ? hyper-threading ?)

Interesting but consistent with my observations so far - Nehalems are  
not any faster than equally clocked Harpertowns (see dcg time). The  
only gains are in HT as seen in your example - my Harpertown has 4  
logical cpus, yours has 16. My 2.26GHz Nehalem is running Leopard  
(because it's the build machine ;)) but the results are similar:

 > system.time(threads(100000,1000,"omp_try"))
    user  system elapsed
  12.924   0.031   0.852
 > system.time(threads(100000,1000,"dcg_try"))
    user  system elapsed
  11.595   0.009  11.608

Again, the sequential time is about the same as on equally clocked  
Harpertown, but the HT helps with a factor of over 13. That explains  
where the alleged performance boost on Nehalems comes from ...

It would be interesting to run OMP pnmath with schedule(dynamic) on a  
8-core Nehalem and compare that with a stock R ... (pnmath will need a  
bit of tweaking because it attempts to be too smart on the number of  
threads). Clearly, on many short operations it may cause a hit, but  
the gain on long vectors is up to 16 which is impressive ...

Cheers,
Simon
i have no idea how the open-sourced gcd works on
non-mac hardware

code is downloadable using webdav from
public.me.com/jdeleeuw/software/threads
or using afp://gifi.stat.ucla.edu from
the deleeuw public directory

On Sep 17, 2009, at 12:35 , Simon Urbanek wrote:

On Sep 17, 2009, at 15:20 , Simon Urbanek wrote:

Jan,

thanks for sharing this. This is really interesting. We have been  
contemplating using GCD for R (mainly pnmath) but at the time OMP  
was faster. However, GCD got apparently really good in the meantime:

system.time(threads(100000,1000,"omp_try"))
user  system elapsed
9.671   0.009   2.441
system.time(threads(100000,1000,"gcd_try"))
user  system elapsed
9.592   0.004   2.410
system.time(threads(100000,1000,"dcg_try"))
user  system elapsed
9.784   0.003   9.788

[This is on Harpertown 2.66GHz quad core]

So GCD is surprisingly just a hair faster than OMP (also  
surprising to me is that using more threads than cores make OMP  
faster - the above is with 16 threads).

Actually, with schedule(dynamic) the gap is almost at the level of  
the measurement error:

system.time(threads(100000,1000,"omp_try"))
 user  system elapsed
9.614   0.006   2.420
system.time(threads(100000,1000,"gcd_try"))
 user  system elapsed
9.586   0.005   2.409

-- the OMP line (to be placed before the for() loop) is#pragma omp  
parallel for default(shared) private(i) schedule(dynamic)

Cheers,
Simon

On Sep 17, 2009, at 14:24 , Jan de Leeuw wrote:

a) Obviously OpenMP is more portable. Even on a Mac I had to use  
Apple's gcc in this case
(I normally use the GNU gcc-trunk).

b) GCD does not require specifying the number of threads -- it  
determines it at runtime.

c) Coding is simpler.

I would not say - OMP takes just one #pragma - no need to change  
your code whereas GCD requires several special function calls...  
However, OMP is more limited in the kind of things you can do.

Cheers,
Simon

d) Since GCD is at a lower OS level than OpenMP, it will probably  
handle resource allocation
better. But my small example, on an otherwise idle Mac Pro (16  
cores, 32 GB of RAM), does
not really highlight that.

e) For more info, and some OpenMP comparisons, see

http://www.macresearch.org/cocoa-scientists-xxxi-all-aboard-grand-central
http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/12

To quote Syracuse

"Write your application as usual, but if there's any part of its  
operation that can
reasonably be expected to take more than a few seconds to  
complete, then for the love of Zarzycki,
get it off the main thread!"

On Sep 17, 2009, at 11:03 , Saptarshi Guha wrote:

Nice, how does this compare when using OpenMP?
How does it compare when several other core hungry processes are  
running?( GC is supposed to nicely handle resource allocation,  
does OpenMP compete with the other processes?).

Regards
Saptarshi

===
Jan de Leeuw; Distinguished Professor and Chair, UCLA Department  
of Statistics;
Director: UCLA Center for Environmental Statistics (CES);
Editor: Journal of Multivariate Analysis, Journal of Statistical  
Software;
US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA  
90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw at stat.ucla.edu
.mac: jdeleeuw ++++++  aim: deleeuwjan ++++++ skype: j_deleeuw
homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org
-------------------------------------------------------------------------------------------------
      No matter where you go, there you are. --- Buckaroo Banzai
               http://gifi.stat.ucla.edu/sounds/nomatter.au

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

===
Jan de Leeuw; Distinguished Professor and Chair, UCLA Department of  
Statistics;
Director: UCLA Center for Environmental Statistics (CES);
Editor: Journal of Multivariate Analysis, Journal of Statistical  
Software;
US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA  
90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw at stat.ucla.edu
.mac: jdeleeuw ++++++  aim: deleeuwjan ++++++ skype: j_deleeuw
homepages: http://gifi.stat.ucla.edu ++++++ http://www.cuddyvalley.org
-------------------------------------------------------------------------------------------------
         No matter where you go, there you are. --- Buckaroo Banzai
                  http://gifi.stat.ucla.edu/sounds/nomatter.au
-------------------------------------------------------------------------------------------------

Grand Central Dispatch (simple loop optimization)

Thread (5 messages)