Skip to content

Compiler options for R binary

11 messages · Braun, Michael, Amos B. Elberg, David Winsemius +3 more

#
I run R on a recent Mac Pro (Ivy Bridge architecture), and before that, on a 2010-version (Nehalem architecture).  For the last few years I have been installing R by compiling from source.  The reason is that I noticed in the etc/Makeconf file that the precompiled binary is compiled with the -mtune=core2 option.  I had thought that since my system uses a processor with a more recent architecture and instruction set, that I would be leaving performance on the table by using the binary.

My self-compiled R has worked well for me, for the most part. But sometimes little things pop-up, like difficulty using R Studio, an occasional permissions problem related to the Intel BLAS, etc.  And there is a time investment in installing R this way.  So even though I want to exploit as much of the computing power on my desktop that I can, now I am questioning whether self-compiling R is worth the effort.

My questions are these:

1.  Am I correct that the R binary for Mac is tuned to Core2 architecture?  
2.  In theory, should tuning the compiler for Sandy Bridge (SSE4.2, AVX instructions, etc) generate a faster R?
3.  Has anyone tested the theory in Item 2?
4.  Is the reason for setting -mtune=core2 to support older machines?  If so, are enough people still using pre-Nehalem 64-bit Macs to justify this?
5.  What would trigger a decision to start tuning the R binary for a more advanced processor?
6.  What are some other implications of either self-compiling or using the precompiled binary that I might need to consider?  

tl;dr:  My Mac Pro has a Ivy Bridge processor.  Is it worthwhile to compile R myself, instead of using the binary?

Thanks,

Michael


--------------------------
Michael Braun
Associate Professor of Marketing
Cox School of Business
Southern Methodist University
Dallas, TX 75275
braunm at smu.edu
#
I got a speed bump from recompiling on Mac. The Cran version is built with an llvm that doesn't support openmp.  Apart from benefits from choice of BLAS, the benefit from just recompiling is real but not huge; like on the order of 10-15%. I set tune and all of that to "auto." 

Optimization can also be pushed beyond the Cran settings without breaking make check, and the are compiler optimizations that provide a larger boost, break odd parts of make check, but I'm not sure actually break any functional part of R.

Overall, apart from the BLAS, the speed change is modest enough that I really wonder if the book is worth the candle.
#
On Nov 20, 2014, at 8:17 AM, Braun, Michael wrote:

            
I use an early 2008 MacPro (Lion, so to go to Yosemite, currently with R's SL branch) and a 2009 MacbookPro (Yosemite). (After consulting Wikipedia's pages on Mac processors I'm not sure that pre- and post-Nehalem is sufficiently clear for all platforms that your question can be answered with clarity. It also appears to me that all Macbooks were core2 even now, and if so I think you would get a lot of complaints by making them incompatible with the base version of R. If I'm reading those pages correctly my 15 inch MBP from 2009 is Lynnfield.)
David Winsemius
Alameda, CA, USA
#
On Nov 20, 2014, at 11:17 AM, Braun, Michael <braunm at mail.smu.edu> wrote:

            
In theory, yes, but often the inverse is true (in particular for AVX).
Only partially. In fact, the flags are there explicitly to increase the tuning level - the default is even lower. Last time I checked there were no significant benefits in compiling with more aggressive flags anyway. (If you want to go there, Jan De Leeuw used to publish most aggressive flags possible). You cannot relax the math ops compatibility which is the only piece that typically yields gain, because you start getting wrong math op results. You have to be very careful with benchmarking, because from experience optimizations often yield speed ups in some areas, but also introduce slowdown in other areas - it's not always a gain (one example on the extreme end is AVX: when enabled some ops can even take twice as long, believe it or not...) and even the gains are typically in single digit percent range.
When you compile from sources, you're entirely on your own and you have to take care of all dependencies (libraries) and compilation yourself. Most Mac users don't want to go there since they typically prefer to spend their time elsewhere ;).

BTW: if you really care about speed, the real gains are with using parallel BLAS, Intel OpenMP runtime and enabling built-in threading support in R.

Cheers,
Simon
#
Simon.

  It's been a while since I've looked but are there precise instructions on
how to implement your BTW?

Brandon.

On Thursday, November 20, 2014, Simon Urbanek <simon.urbanek at r-project.org>
wrote:

  
  
#
Simon Urbanek <simon.urbanek at r-project.org> writes:
I have to mention homebrew [1]here - by tuning the recipe used to install R,
one could (I guess) tune compiler options and recompile without any
fuss. The R installation with homebrew worked for me out-of-the-box and
the re-compilation and installation is one command.

The recipes are simple ruby scripts and can easily be changed.

OK - I come from a Linux background, but I like the homebrew approach
and it works flawless for me.

Cheers,

Rainer
Footnotes: 
[1]  http://brew.sh
#
There is FAQ "10.5 Which BLAS is used and how can it be changed?" for the first part.
iomp is a bit involved, since you have to build it in the first place, but I'm working on making it part of the CRAN binaries in the near future.

Cheers,
Simon
#
On Nov 21, 2014, at 3:47 AM, Rainer M Krug <Rainer at krugs.de> wrote:
As others have said - if you don't mind the crashes, then it's ok. I actually like homebrew, it's good for small tools when you're in the pinch, but it doesn't tend to work well for complex things like R (or package that has many options). Also like I said, you'll have to take care of packages and dependencies yourself - not impossible, but certainly extra work. However, if you don't mind to get your hands dirty, then I would recommend Homebrew over the alternatives.

Cheers,
Simon
#
Simon Urbanek <simon.urbanek at r-project.org> writes:
Well - I am using R via ESS and nearly never the GUI, so I can't say
anything from that side, but I never had crashes of R after switching to
homebrew - but I might be lucky.
As I said - I am coming from the Linux side of things (but always used
the binaries there...) so I don't mind compiling and prefer the better
control / understanding homebrew gives me. And my hands never got as
dirty as trying to compile under Linux :-)

Cheers,

Rainer

  
    
#
Thank you for all of the helpful replies.  I think I?ll go back to using the CRAN binary, and still link to an external BLAS.

I do have some follow-up questions:

1.  Section 10.5 of the R for Mac FAQ suggests that there is a libRblas.veclib.dylib in the Resources/lib directory.  I do not see that after installing the binary for R 3.1.2.  I can still link to the Apple vecLib (/System/Library/Accelerate ?./libBLAS.dylib --  it?s a very long path), but there appears to be an inconsistency between the CRAN build and the FAQ.

2.  Simon mentioned Intel OpenMP runtime, and enabling R threading support.  Is this something that can be done at the user level (like pointing to a different BLAS), or is it something that needs to be built in to the binary?

3.  Just out of curiosity, what are the operations that slow down with AVX?  Someday, when I have some free time, I may want to check that out, mainly as a learning experience.
On Nov 22, 2014, at 9:57 AM, Rainer M Krug <Rainer at krugs.de<mailto:Rainer at krugs.de>> wrote:
Simon Urbanek <simon.urbanek at r-project.org<mailto:simon.urbanek at r-project.org>> writes:
On Nov 21, 2014, at 3:47 AM, Rainer M Krug <Rainer at krugs.de<mailto:Rainer at krugs.de>> wrote:
Simon Urbanek <simon.urbanek at r-project.org<mailto:simon.urbanek at r-project.org>> writes:
On Nov 20, 2014, at 11:17 AM, Braun, Michael <braunm at mail.smu.edu<mailto:braunm at mail.smu.edu>> wrote:
I run R on a recent Mac Pro (Ivy Bridge architecture), and before
that, on a 2010-version (Nehalem architecture).  For the last few
years I have been installing R by compiling from source.  The reason
is that I noticed in the etc/Makeconf file that the precompiled
binary is compiled with the -mtune=core2 option.  I had thought that
since my system uses a processor with a more recent architecture and
instruction set, that I would be leaving performance on the table by
using the binary.

My self-compiled R has worked well for me, for the most part. But
sometimes little things pop-up, like difficulty using R Studio, an
occasional permissions problem related to the Intel BLAS, etc.  And
there is a time investment in installing R this way.  So even though
I want to exploit as much of the computing power on my desktop that
I can, now I am questioning whether self-compiling R is worth the
effort.

My questions are these:

1.  Am I correct that the R binary for Mac is tuned to Core2 architecture?
2.  In theory, should tuning the compiler for Sandy Bridge (SSE4.2, AVX instructions, etc) generate a faster R?

In theory, yes, but often the inverse is true (in particular for AVX).


3.  Has anyone tested the theory in Item 2?
4.  Is the reason for setting -mtune=core2 to support older
machines?  If so, are enough people still using pre-Nehalem 64-bit
Macs to justify this?

Only partially. In fact, the flags are there explicitly to increase
the tuning level - the default is even lower. Last time I checked
there were no significant benefits in compiling with more aggressive
flags anyway. (If you want to go there, Jan De Leeuw used to publish
most aggressive flags possible). You cannot relax the math ops
compatibility which is the only piece that typically yields gain,
because you start getting wrong math op results. You have to be very
careful with benchmarking, because from experience optimizations often
yield speed ups in some areas, but also introduce slowdown in other
areas - it's not always a gain (one example on the extreme end is AVX:
when enabled some ops can even take twice as long, believe it or
not...) and even the gains are typically in single digi
t percent range.


5.  What would trigger a decision to start tuning the R binary for a more advanced processor?
6.  What are some other implications of either self-compiling or
using the precompiled binary that I might need to consider?


When you compile from sources, you're entirely on your own and you
have to take care of all dependencies (libraries) and compilation
yourself. Most Mac users don't want to go there since they typically
prefer to spend their time elsewhere ;).

I have to mention homebrew [1]here - by tuning the recipe used to install R,
one could (I guess) tune compiler options and recompile without any
fuss. The R installation with homebrew worked for me out-of-the-box and
the re-compilation and installation is one command.

The recipes are simple ruby scripts and can easily be changed.

OK - I come from a Linux background, but I like the homebrew approach
and it works flawless for me.


As others have said - if you don't mind the crashes, then it's ok.

Well - I am using R via ESS and nearly never the GUI, so I can't say
anything from that side, but I never had crashes of R after switching to
homebrew - but I might be lucky.

I actually like homebrew, it's good for small tools when you're in the
pinch, but it doesn't tend to work well for complex things like R (or
package that has many options). Also like I said, you'll have to take
care of packages and dependencies yourself - not impossible, but
certainly extra work.



However, if you don't mind to get your hands dirty, then I would
recommend Homebrew over the alternatives.

As I said - I am coming from the Linux side of things (but always used
the binaries there...) so I don't mind compiling and prefer the better
control / understanding homebrew gives me. And my hands never got as
dirty as trying to compile under Linux :-)

Cheers,

Rainer



Cheers,
Simon




Cheers,

Rainer


BTW: if you really care about speed, the real gains are with using
parallel BLAS, Intel OpenMP runtime and enabling built-in threading
support in R.

Cheers,
Simon


tl;dr:  My Mac Pro has a Ivy Bridge processor.  Is it worthwhile to compile R myself, instead of using the binary?

Thanks,

Michael


--------------------------
Michael Braun
Associate Professor of Marketing
Cox School of Business
Southern Methodist University
Dallas, TX 75275
braunm at smu.edu<mailto:braunm at smu.edu>

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac



Footnotes:
[1]  http://brew.sh

--
Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982

--
Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982

--------------------------------------------
Michael Braun, Ph.D.
Associate Professor of Marketing
Cox School of Business
Southern Methodist University
braunm at smu.edu<mailto:braunm at smu.edu>
#
On Nov 22, 2014, at 1:27 PM, Braun, Michael <braunm at mail.smu.edu> wrote:

            
Thanks, I'll look into it, it should be there at least for the SL build.
Unfortunately not, because GOMP has both performance and stability issues on OS X, so the CRAN binaries are explicitly build with disabled OpenMP support to work around that. However, as I said, we hope to have iomp binaries soon (perhaps even as soon as next week).
I would have to dig out the benchmarks, but if I recall correctly there was a set of BLAS kernels that doubled the runtime with AVX enabled. There were other instances, too, but I can't recall the details. In a spare minute I can try to replicate the experiment.

Cheers,
Simon