Perplexed benchmark result from a new Macbook Pro Core i5
On May 10, 2010, at 3:44 PM, Stefan Evert wrote:
Just a thought: Wouldn't it make more sense to compare the "elapsed" times, which show that both machines are more or less equally fast (with a slide edge for the newer i5)? I suspect that there is a change in the way "user" time is reported, which probably adds up running times of four hyperthreads running on two cores for the i5 CPU vs. only two threads on two cores for the Core 2 Duo. If I'm not mistaken about the i5 architecture, this is not surprising: there are 4 threads, but they have to share 2 cores and don't seem to be able to run the FP instructions in parallel on a single core; so they're running at half speed only.
Oh my - how could I miss that :) Stefan, you're right - the elapsed is indeed shorter on the i5 so ATLAS simply spreads some part across threads without any gain (unsurprising on such small tasks). I got thrown off by the fact that Shark was not reporting other threads, byt that may be an issue in Shark...
Thanks for the benchmark, by the way. It's good to know I'm not missing out on R performance with my good old 2008 MacBook Pro. :-)
Well, ever since the GHz race is over the performance of machines has not increased (or even to the contrary) since it's hard to feed all cores for common R tasks in practice and per-thread speed has gone down (see the old Nehalem benchmarks thread - unfortunately it's still true). Cheers, Simon
On 8 May 2010, at 20:53, Gardar Johannesson wrote:
########################################### ## Results from new macbook pro (Core i5 @ 2.4Ghz)
set.seed(1) A <- matrix(rnorm(2000*2000),2000,2000) system.time(B <- crossprod(A))
user system elapsed 2.500 0.058 0.816
system.time(B <- crossprod(A))
user system elapsed 2.502 0.050 0.814
system.time(solve(B))
user system elapsed 7.208 0.265 2.740
system.time(solve(B))
user system elapsed 7.121 0.264 2.666
system.time({a <- rep(1.0,100); for(i in 1:1e6) a <- 1.0*a+0.0})
user system elapsed 2.964 0.602 3.528
system.time({a <- rep(1.0,100); for(i in 1:1e6) a <- 1.0*a+0.0})
user system elapsed 3.040 0.732 3.732
################################################### ## Results from old macbook pro (Core 2 Duo @ 2.2GHz)
set.seed(1) A <- matrix(rnorm(2000*2000),2000,2000) system.time(B <- crossprod(A))
user system elapsed 1.429 0.073 0.800
system.time(B <- crossprod(A))
user system elapsed 1.429 0.064 0.874
system.time(solve(B))
user system elapsed 4.532 0.285 2.860
system.time(solve(B))
user system elapsed 4.521 0.281 2.834
system.time({a <- rep(1.0,100); for(i in 1:1e6) a <- 1.0*a+0.0})
user system elapsed 3.501 0.764 4.215
system.time({a <- rep(1.0,100); for(i in 1:1e6) a <- 1.0*a+0.0})
user system elapsed 3.459 0.702 4.113
sessionInfo()
R version 2.11.0 (2010-04-22) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base
################################################### ## Results from new macbook pro (Core i5 @ 2.4Ghz) ## Linking against Goto2 BLAS (vs vecLib)
set.seed(1) A <- matrix(rnorm(2000*2000),2000,2000) system.time(B <- crossprod(A))
user system elapsed 2.348 0.124 0.635
system.time(B <- crossprod(A))
user system elapsed 2.342 0.110 0.622
system.time(solve(B))
user system elapsed 6.634 0.327 2.158
system.time(solve(B))
user system elapsed 6.697 0.348 2.034
system.time({a <- rep(1.0,100); for(i in 1:1e6) a <- 1.0*a+0.0})
user system elapsed 2.577 0.548 2.885
system.time({a <- rep(1.0,100); for(i in 1:1e6) a <- 1.0*a+0.0})
user system elapsed 2.411 0.478 2.859
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-mac