Skip to content
Prev 6299 / 15075 Next

Grand Central Dispatch (simple loop optimization)

Jan,
On Sep 17, 2009, at 16:16 , Jan de Leeuw wrote:

            
Interesting but consistent with my observations so far - Nehalems are  
not any faster than equally clocked Harpertowns (see dcg time). The  
only gains are in HT as seen in your example - my Harpertown has 4  
logical cpus, yours has 16. My 2.26GHz Nehalem is running Leopard  
(because it's the build machine ;)) but the results are similar:

 > system.time(threads(100000,1000,"omp_try"))
    user  system elapsed
  12.924   0.031   0.852
 > system.time(threads(100000,1000,"dcg_try"))
    user  system elapsed
  11.595   0.009  11.608

Again, the sequential time is about the same as on equally clocked  
Harpertown, but the HT helps with a factor of over 13. That explains  
where the alleged performance boost on Nehalems comes from ...

It would be interesting to run OMP pnmath with schedule(dynamic) on a  
8-core Nehalem and compare that with a stock R ... (pnmath will need a  
bit of tweaking because it attempts to be too smart on the number of  
threads). Clearly, on many short operations it may cause a hit, but  
the gain on long vectors is up to 16 which is impressive ...

Cheers,
Simon