Skip to content

How to Speed up R on the G5

12 messages · Jake Bowers, Simon Urbanek, Michael Redmond +2 more

#
Hi All,

I've been receiving some friendly grief from a friend with a Linux
dual-Opteron system about the performance of his R package on the OS X G5
system.

He has suggested recompiling R-patched with a variety of different
compilers and compiler flags. And has also suggested just recompiling
his package with different flags and compilers (while leaving
r-patched as I have currently built it using gcc 3.3 20030304 (Apple
Computer, Inc. build 1671), and g77 3.4.2 (from that wonderful site:
hpc.sf.net)).

I have now successfully recompiled R using a few different
configurations. Each one builds and passes make check (except for
reg-tests-1.R <-- which has failed in all cases and also on my debian
box, which suggests that there is something going on with
reg-tests-1.R in r-patched that is not OS X dependent)

My first question is how to play with these different versions without
breaking my production version? That is, I don't want to have to
delete my currently working build of R-patched each time I want to run
a speed test.

My second question is whether there are ways other than using
--with-blas="-framework vecLib", to take advantage of what I thought
was the power of the G5 (or dual G5s in my case).

I'm sure this is a complete newbie type of question, and
I apologize in advance for my ignorance!

For those of you who are interested, here are some ways that I've
been trying to optimize R for the G5. I can't report speed tests yet
because of my inexperience with compiling things (as made clear by my
first question!).

FYI, I'm building versions in the most stripped down way that I can
envision, since I mainly just want speed. I'm also doing make
distclean in between builds, and hand editing tests/Makefile to delete
the reference to reg-tests-1.R after it fails. And I am using
r-patched updated via svn update yesterday.

Here is what I'm playing with:

1) One set of builds with standard compilers and flags
(--with-blas="-framework vecLib" --with-lapack")

2) One build like (1) but using the libgoto.dylib version of BLAS and
the vecLib stuff for lapack (It doesn't work with just
--with-blas"-L/usr/local/lib -lgoto"
--with-lapack). (http://www.cs.utexas.edu/users/kgoto/signup_first.html#For_OS_X)

./configure --with-blas="-L/usr/local/lib -lgoto"
--with-lapack="-framework vecLib" --without-aqua --with-x
--disable-R-shlib --disable-R-profiling --without-recommended-packages

3) Another set of builds with some compiler flags:
 C compiler:                /usr/bin/gcc  -g -O3 -mcpu=970 -mtune=970 -mpowerpc64 -mpowerpc-gpopt -force_cpusubtype_ALL
 C++ compiler:              g++  -g -O3 -mcpu=970 -mtune=970 -mpowerpc64 -mpowerpc-gpopt -force_cpusubtype_ALL
 Fortran compiler:          g77  -O3 -mcpu=970 -mtune=970 -mpowerpc64 -mpowerpc-gpopt -force_cpusubtype_ALL

4) Another like (3), but with the libgoto BLAS.

This leaves me with 4 builds to test. I figure I have to say "R CMD
INSTALL thepackage.tar.gz" for each build to test my friend's pacakge. At
least that is what I think... I don't really know if there is a more
direct way.

*Other attempts at optimization which failed:

My friend also suggested using gcc-4.0 with CFLAGS and FFLAGS
including "-ftree-vectorize -maltivec", but this wouldn't completely
build.

Another option was to use other compiler flags on the Apple provided gcc, like this:

 C compiler:  /usr/bin/gcc -g -O3 -funroll-loops -fstrict-aliasing
 -fsched-interblock -falign-loops=16 -falign-jumps=16 -falign-functions=16
 -falign-jumps-max-skip=15 -falign-loops-max-skip=15 -malign-natural
 -ffast-math -mpowerpc-gpopt -force_cpusubtype_ALL -fstrict-aliasing
 -mtune=G5 -mcpu=G5 -mpowerpc64

 C++ compiler:  g++ -g -O3 -mcpu=970 -mtune=970 -mpowerpc64
 -mpowerpc-gpopt -force_cpusubtype_ALL -funroll-loops -fstrict-aliasing
 -fsched-interblock -falign-loops=16 -falign-jumps=16 -falign-functions=16
 -falign-jumps-max-skip=15 -falign-loops-max-skip=15 -malign-natural
 -ffast-math

  Fortran compiler:  g77 -O3 -funroll-loops -fstrict-aliasing
  -fsched-interblock -falign-loops=16 -falign-jumps=16
  -falign-functions=16 -falign-jumps-max-skip=15 -falign-loops-max-skip=15
  -malign-natural -ffast-math -mpowerpc-gpopt -force_cpusubtype_ALL
  -fstrict-aliasing -mtune=G5 -mcpu=G5 -mpowerpc64

but, although this compiled ok, it failed the make check on the first test (base-Ex.R with:
Warning in table(x) == tx0 : longer object length
	is not a multiple of shorter object length
Error in stopifnot(table(x) == tx0) : dim<- : dims [product 8] do not match the length of object [9]
Execution halted)

Finally, he suggested looking into the AbSoft compilers. But, I
figured I'd save my money and see if other folks have had luck with
those yet.

Thanks very much for any thoughts or help any of y'all might have!


Jake

Jake Bowers
Assistant Professor
Dept of Political Science
University of Michigan
jwbowers@umich.edu
http://www.umich.edu/~jwbowers/
#
Jake,
I am also very interested in this. We are running R through an iNquiry 
portal, and have found that (at least for an early test) the performance 
is not as good as a 2.8Ghz P4 system (11 hours on the P4 vs 13 houes on 
the XServe). The P4 has less memory (1G on P4 vs 2G on Xserve nodes). I 
am ready to benchmark any improvements.

The application we have is R with Bioconductor doing MCMC, though I 
don't have many more specifics. All I know is that the test is 
"real-world".

Our installation is via compile from source of R-2.0.1 using fink. No 
modifications were made to the standard Make/Install config.

Thanks
Mike
---
Jake Bowers wrote:
#
On Mon, 7 Feb 2005, Jake Bowers wrote:
You can build R without installing it, as long as you don't want to use 
the GUI.  For example, I have the current development version of R in 
~/R-devel/R and version 2.0.0 installed in /.

Then use

  ~/path/to/this/version/R CMD INSTALL thepackage

to install the package.  Make sure you don't have an .Rprofile that would 
put all the packages in the same place.
I haven't found anything very impressive yet (this is on an iMac G5, so 
with much lower bus speed).

I am planning to test the IBM (Absoft) compilers when I get time.  In the 
past people have found that a good Fortran compiler can speed R up 
noticeably (eg on Solaris).


 	-thomas
#
Jake,
I will address just some issues you mentioned based on some tweaking of 
R, but since I don't have a G5 yet, some of it is just theoretical ;).
On Feb 7, 2005, at 10:10 AM, Jake Bowers wrote:
Just don't install it. You don't need to run "make install" to use R - 
just start it from the build directory.
As you mention yourself further down - using compiler flags you can 
increase the level of optimization.
This is due to the fact that too aggressive optimization results in 
non-IEEE conforming behavior of float computations and NA/NaN handling 
gets screwed up.
Look at the equivalent flags for "-fast" (see man gcc) and try to 
remove "-ffast-math" - that should give you fastest possible binary 
while improving the above behavior (but I didn't test it myself).
BTW: R compiles cleanly with the latest gcc 4.0 from Apple (build 4039) 
and gfortran from hpc site - previous versions didn't quite work. If 
you have ADC seed, try the latest build - it's specifically optimized 
for auto-vectorization.
Cheers,
Simon
#
Thomas,
IBM makes C compiler optimized for Mac OS/X:

http://www-306.ibm.com/software/info/ecatalog/en_US/products/K107418L80422W37.html

But I did not see a Fortran compiler. Does anyone know if Absoft the 
same as the above and does it include the compatible optimizing Fortran 
compiler? Or is there another Fortran compiler that is compatible with 
the above?

I am very interested in an optimized compile of R with option packages 
and bioconductor for OS/X on an XServe.

Thanks
Mike Redmond
UW-Madison
---
Thomas Lumley wrote:
#
On Feb 7, 2005, at 10:47 AM, Thomas Lumley wrote:

            
Just enabling G5 optimizations without any other aggressive flags gave 
me 30% boost on the code I was testing (on a dual G5 - btw: just in 
case someone didn't know - for computations one R process uses only one 
processor). But I think vecLib should be pretty well optimized by Apple 
already, so I don't know of any much faster BLAS implementation for OS 
X ...

Cheers,
Simon
#
I found an answer to my own question on combined IBM C and Fortran 
compilers. IBM makes a matching XL Fortran compiler to the XL C compiler 
per:

http://www-306.ibm.com/software/awdtools/fortran/xlfortran/

This may be the same product as in the Absoft line, or it may be a 
little more specific and advanced for OS/X and IBM systems.

There is also a trial version, so it might be possible to see if there 
is benefits (and find all the glitches) with an XL Compiler 
implementation of R.

Thanks
Mike
---
Thomas Lumley wrote:
#
On Mon, 7 Feb 2005, Michael Redmond wrote:

            
http://www-306.ibm.com/software/awdtools/fortran/xlfortran/features/macosx/xlf-mac.html
The IBM page on 'How to buy' links to

http://www.absoft.com/Products/Compilers/Fortran/Macintosh/XLF/xlf.html

It's not the same as the Absoft Pro Fortran.  I don't know which is 
better.  They also aren't explicit on whether the IBM Fortran is 
compatible with gcc --  it isn't on our university's central AIX system.

There is a free trial period, though...

 	-thomas
#
Michael,
On Feb 7, 2005, at 10:59 AM, Michael Redmond wrote:

            
http://www-306.ibm.com/software/awdtools/fortran/xlfortran/features/ 
macosx/index.html

It's supposedly one of the best, but some time ago Jan tried to compile  
R using xlf, but without success. However, I don't know the current  
status...

Cheers,
Simon
#
The XL Fortran link from IBM goes directly to Absoft that sells XL 
Fortran for $399 to government institutions. The 60 day trial will be a 
good way to test things out.
Thanks
Mike

----
Thomas Lumley wrote:
#
Just one general note -- complementing all the
knowledgable and specific replies you already got :

This is a statement in much generality:

If you build (configure ; make ..) your own version of R,
please do run "make check" (at least; there's more checks you
can run).

If that gives errors (or warnings that look peculiar), do
investigate, and possibly ask for advice on R-devel (or a more
specialized list R-SIG-Mac).

Do *not* be happy if your newly compiled version of R runs a
faster but doesn't pass "make check". 
	----------------------------------------------------
Rather, DO NOT USE SUCH A VERSION OF R for anything "real" !
	----------------------------------------------------
(unless you are very sure that the reasons for failing "make
 check" won't be a problem -- and do note that I think it's only
 rarely possible to be honestly sure here!)

Martin Maechler, ETH Zurich
(speaking for myself officially, but still as member of the R-core team)
#
Simon,
What settings did you change in the compiler lines? I assume this was on 
gcc and g77?
Thanks
Mike Redmond
---
Simon Urbanek wrote: