Skip to content

Perplexed benchmark result from a new Macbook Pro Core i5

7 messages · Gardar Johannesson, Stefan Evert, Simon Urbanek +1 more

#
I was just replacing a Macbook Pro from 2008 (with a 2.2GHz Intel Core 2 Duo) with a new Macbook Pro (with a 2.4GHz Intel Core i5). To get a rough idea about the difference in R execution speed I ran a small test, with the output shown below. In short:

1) The new Macbook Pro was ca 60% _slower_ at linear algebra (crossprod() and solve())
2) The new Macbook Pro was ca 17% faster on a long for-loop
3) Linking against Goto2 versus vecLib improved the linear algebra results slightly

Both test were done using the same 2.11.0 dmg image from CRAN. 

Any thoughts on this? 

Any ideas how I can improve the performance results? What about compiling from source?


Thanks,
Gardar Johannesson


###########################################
## Results from new macbook pro (Core i5 @ 2.4Ghz)
user  system elapsed 
  2.500   0.058   0.816
user  system elapsed 
  2.502   0.050   0.814
user  system elapsed 
  7.208   0.265   2.740
user  system elapsed 
  7.121   0.264   2.666
user  system elapsed 
  2.964   0.602   3.528
user  system elapsed 
  3.040   0.732   3.732
R version 2.11.0 (2010-04-22) 
i386-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.11.0
###################################################
## Results from old macbook pro (Core 2 Duo @ 2.2GHz)
user  system elapsed 
  1.429   0.073   0.800
user  system elapsed 
  1.429   0.064   0.874
user  system elapsed 
  4.532   0.285   2.860
user  system elapsed 
  4.521   0.281   2.834
user  system elapsed 
  3.501   0.764   4.215
user  system elapsed 
  3.459   0.702   4.113
R version 2.11.0 (2010-04-22) 
i386-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
###################################################
## Results from new macbook pro (Core i5 @ 2.4Ghz)
## Linking against Goto2 BLAS (vs vecLib)
user  system elapsed 
  2.348   0.124   0.635
user  system elapsed 
  2.342   0.110   0.622
user  system elapsed 
  6.634   0.327   2.158
user  system elapsed 
  6.697   0.348   2.034
user  system elapsed 
  2.577   0.548   2.885
user  system elapsed 
  2.411   0.478   2.859
R version 2.11.0 (2010-04-22) 
i386-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
1 day later
#
On May 8, 2010, at 2:53 PM, Gardar Johannesson wrote:

            
What OS X versions are on the respective machines? The vecLib performance varies greatly with their versions.
Note that you are essentially just comparing the BLAS libraries on each machine, R is practically not involved in this at all, so if you meant R in the "compiling from source" then the answer is likely no (R speed is what you see in the loops).


I don't have an i5 arounds, but comparing similar architectures (Penryn vs Nehalem) gives a slight edge to Nehalem (3.8s @ 2.8GHz vs 3.4s @ 2.66Ghz on solve(B)) but that is for a Xeon so the memory speed may be the edge (everything on OS X 10.6.3).

Cheers,
Simon
#
I did the same test on a MBP 15 i7 2.66GHz. The results follows. This is for x86_64 version, the i386 version (aka R32) is slightly slower than this.
user  system elapsed 
  1.983   0.029   0.628
user  system elapsed 
  1.988   0.023   0.595
user  system elapsed 
  5.980   0.183   2.366
user  system elapsed 
  5.781   0.155   1.982
user  system elapsed 
  2.269   0.919   3.160
user  system elapsed 
  2.207   0.846   3.029
R version 2.11.0 Patched (2010-04-28 r51858) 
x86_64-apple-darwin9.8.0
On 8 May 2010, at 19:53, Gardar Johannesson wrote:

            
#
To clarify, both Macbook Pro tests were carried out on OS X 11.6.3. Regarding memory speed, the new laptop uses 1067GHz DDR3, while the older one uses 667GHz DDR2. In short, the 2008 laptop has both slower and older CPU and slower memory, but I think it has slightly larger cache memory (4Mb versus 3Mb---but I think there is more to it).

I did a little linear regression (lm()) test for a 100000x100 dimension matrix. In this case, the Core i5 finished at ~3.2sec while the Core 2 Duo finished at ~5.7sec. So that was good news. But I have not been able to explain the BLAS performance, which I also carried out with 500 and 5000 dimensional matrices with the same results (i.e. Core 2 Duo ahead of Core i5).

I guess there is no simple explanation for this.

Thanks for looking in to this,
Gardar
On May 10, 2010, at 6:20 AM, Simon Urbanek wrote:

            
#
Just a thought: Wouldn't it make more sense to compare the "elapsed"  
times, which show that both machines are more or less equally fast  
(with a slide edge for the newer i5)?

I suspect that there is a change in the way "user" time is reported,  
which probably adds up running times of four hyperthreads running on  
two cores for the i5 CPU vs. only two threads on two cores for the  
Core 2 Duo.  If I'm not mistaken about the i5 architecture, this is  
not surprising: there are 4 threads, but they have to share 2 cores  
and don't seem to be able to run the FP instructions in parallel on a  
single core; so they're running at half speed only.

Thanks for the benchmark, by the way.  It's good to know I'm not  
missing out on R performance with my good old 2008 MacBook Pro. :-)

Cheers,
Stefan
On 8 May 2010, at 20:53, Gardar Johannesson wrote:

            
#
On May 10, 2010, at 3:44 PM, Stefan Evert wrote:

            
Oh my - how could I miss that :) Stefan, you're right - the elapsed is indeed shorter on the i5 so ATLAS simply spreads some part across threads without any gain (unsurprising on such small tasks). I got thrown off by the fact that Shark was not reporting other threads, byt that may be an issue in Shark...
Well, ever since the GHz race is over the performance of machines has not increased (or even to the contrary) since it's hard to feed all cores for common R tasks in practice and per-thread speed has gone down (see the old Nehalem benchmarks thread - unfortunately it's still true).

Cheers,
Simon
1 day later
#
My best guess of the BLAS is that vecLib is not optimised for i5 and i7. There is no point to optimise it for some "future products", so it can hardly been optimisation before the release of new MBP. And after the release, there are surely no updates yet. This is just my guess.
On 10 May 2010, at 17:29, Gardar Johannesson wrote: