ATLAS threaded 64 bit Opteron build for R: need -fPIC
On Fri, 10 Feb 2006, Amit Aronovitch wrote:
You set the reply address to Martin Maechler! That's antisocial.
Hi, Sorry for sending such a late reply, and for being abit OT. I've been trying to compile 64 bit ATLAS for numpy (http://numeric.scipy.org/ ), and so far this thread is the most useful one I could google up - thanks!. I encountered similiar problems, and so far could not get a .a linkable to numpy (comparing to your post - it seems I might have forgotten to add the -fPIC for the F77FLAGS or MMFLAGS).
Yes, that _is_ in the R-admin manual. I guess you have not read that - it describes how to install R. You can get it in the R tarball from ftp://ftp.stat.math.ethz.ch/Software/R/R-devel.tar.bz2
Also, I'm having trouble with the ATLAS lapack. To get a usable lib, one has to merge it with a full lapack implementation (as described in the ATLAS errata). However, I'm using RHEL4, and their installed liblapack.a seems to have been compiled without -fPIC, so the merged library is unlinkable to numpy's .so. Is there a way to use Redhat's installed liblapack.so?
No, nor should you want to. If RHEL4 is like FC3/4 watch out, as RH have managed to get BLAS routines in liblapack and not liblas, and use incorrect patches to LAPACK 3.0. (Again, see the latest R-admin manual.)
Few questions about your compiler flags: 1) Is there a reason to compile with -O rather than -O3? (did you try and encounter some problem, or found no major performance difference)
ATLAS chose that. Since the real work is done by hand-tuned assembler code it should not matter.
2) I see you use -mfpmath=387 - does this work better than sse2 (which seems to be the default)? How about the "sse,387" option - should I try that?
Depends on your ATLAS version. Again, ATLAS chose those.
As it happens, I have been trying to build ATLAS on my new dual Opteron
box this morning. The latest devel version (3.7.11) does not build, as at
some point it says it expects the GNU x86-32 assembler. If it did it
would use SSE3 and so be faster.
Both 3.6.0 and 3.7.11 fail because my machine is too fast, and I had to
increase the number of replications (1000) in make/Make.{mv,r1}tune and in
tune/blas/level1/*.c. Even then I do not entirely trust the results (and
the two versions report different L1 caches sizes ...).
I got pretty exasperated with this (it needed about ten builds to get one
that succeeded). Both ACML and the Goto BLAS work well out of the box on
Opterons, but do have licence issues. (Again, see the R-admin manual for
details.)
Martin Maechler wrote:
/ "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk <https://www.stat.math.ethz.ch/mailman/listinfo/r-devel>>
/>>>>>/ >>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
on 26 Feb 2004 15:44:16 +0100 writes:
PD> Douglas Bates <bates at stat.wisc.edu> writes:
>> Have you tried configuring R with Goto's BLAS >> http://www.cs.utexas.edu/users/kgoto/ >> >> I haven't worked with Opteron or Athlon64 computers but I understand >> that Goto's BLAS are very effective on those machines. Furthermore >> Goto's BLAS are (only) available as .so libraries so you don't need to >> mess with creating the .so version.
PD> I tried it, yes. Somewhat to my surprise, it seemed to be not quite as
PD> fast as the threaded ATLAS, but I wasn't very systematic about the
PD> benchmarking.
PD> (and the Goto items have license issues, which get in the way for
PD> binary distributions.)
Thanks a lot, Peter, Brian, Doug, for your feedbacks!
In the mean time, I have three running versions of R(-devel) on
the 64-Opteron
- "plain"
- linked against threaded GOTO
- linked against threaded (static) ATLAS (using -fPIC for compilation;
"large" Rlapack)
and I find that GOTO is faster than ATLAS
consistently (between ~ 5-20%) for several tests
(square matrices; %*% and solve).
ATLAS is still an order of magnitude faster than "plain" for
3000x3000 matrices.
Here are somewhat repeatable "ATLAS for R" build instructions:
1. get ATLAS source; unpack
2. make : use defaults and "express" installation
3. Before "make install ...", edit the Make.<ARCHITECTURE> file:
add "-fPIC" to three places, namely F77FLAGS, CCFLAG0, and MMFLAGS:
which in case of the "threaded Opteron" architecture, leads to
the three new lines
F77FLAGS = -fPIC -fomit-frame-pointer -O -m64
CCFLAG0 = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
MMFLAGS = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
in the file Make.Linux_HAMMER64SSE2_2
4. make install arch=Linux_HAMMER64SSE2_2
5. Sym.link the ATLAS libraries into /usr/local/lib:
cd /usr/local/lib
ln -s <ATLAS_build_dir>/lib/Linux_HAMMER64SSE2_2/lib* .
6. (needed for runtime!):
Use environment variable LD_LIBRARY_PATH=/usr/local/lib
Note that I haven't built *.so (shared) libraries yet.
/
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595