Skip to content

ATLAS threaded 64 bit (Opteron) - need *.so?

9 messages · Peter Dalgaard, Brian Ripley, Douglas Bates +1 more

#
Using ATLAS with R is an old topic quite covered in the "R
Administration" manual (and by R's "configure" script
collection).

I still do not easily manage to build R properly on our new AMD
Opteron (2-processor).
I did work with the current Atlas 3.6.0, configured manually
(but "express" version) to build a threaded ATLAS version, and
successfully ran Atlas' own   "make ptsanity_test arch=Linux_HAMMER64SSE2_2"  
sanity check as well.

As it is known, this builds only static (*.a) versions of the
ATLAS libraries.  However, after an R site search for (something
like) "ATLAS shared", I found Peter Dalgaard's message
        http://finzi.psych.upenn.edu/R/Rhelp02/archive/7158.html
where PD confirmed it would work fine to link against the static
ATLAS libraries.
I didn't need (his suggestion of) using an explicit "-L..atlas_place...", 
since these libraries are symbolically linked into
/usr/local/lib/ which is searched by default.

Now, R's configure (R-devel of 2004-02-24) 
finds the ATLAS setup well behaved, reporting
^^^^^^^^^^^

and compilation (of course) goes fine till the crucial linking
stage :

gcc -shared   -o libRlapack.so dlapack0.lo dlapack1.lo dlapack2.lo dlapack3.lo cmplx.lo  -lf77blas -latlas -L/usr/lib64 -L/usr/lib/gcc-lib/x86_64-redhat-linux/3.2.3 -L/usr/lib/gcc-lib/x86_64-redhat-linux/3.2.3/../../../../lib64 -L/usr/lib/gcc-lib/x86_64-redhat-linux/3.2.3/../../.. -L/lib/../lib64 -L/usr/lib/../lib64 -lfrtbegin -lg2c -lm -lgcc_s
/usr/bin/ld: /usr/local/lib/libf77blas.a(xerbla.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libf77blas.a: could not read symbols: Bad value
collect2: ld returned 1 exit status

and I'm stuck to some extent.
Note that the "recompile with -fPIC"  must related to the
contents of ATLAS' libf77blas.a itself (or to "xerbla.o" more
concretely), since all of R's  dlapack[0-3].lo are of course
compile with -fPIC.

I tend to conclude that I do need shared versions of ATLAS'
libraries?  If yes, I think I've seen instructions on how to
build these.  Where?
If these are really needed, I think I should add them to the
corresponding "R administration manual" section, right?

Thanks in advance for your share of experience here.
Martin
#
Martin Maechler <maechler@stat.math.ethz.ch> writes:
I don't think we actually got them written down (thank you for
volunteering...) but it isn't terribly hard. There are just two basic
tricks:

- add -fPIC all over the place during configure.

- after building you'll get a bunch of .a files. These can be
  converted to .so using 

        ld -shared -o libfoo.so --whole-archive libfoo.a

One thing I haven't gotten around to yet is the ATLAS/LAPACK
integration mentioned on
http://math-atlas.sourceforge.net/errata.html#completelp
#
You do need to add -fPIC to the compile flags.  The same thing happens on 
Solaris 64-bit (under some compilers, anyway).  You don't need a shared 
library, but you do need relocatable code in the static libraries.

You can also try building without xerbla.

I think this is a route that many of us are about to take.  However, I 
would avoid ATLAS and use K. Goto's Opteron BLAS, which is easier to get 
to work (no xerbla) and has instructions in R-devel's R-admin.texi file.

Peter D has a dual Opteron and asked about it a while back, probably on
R-core.  (I have played a bit, but our Opteron cluster is a week or so
away now.)

Brian
On Thu, 26 Feb 2004, Martin Maechler wrote:

            

  
    
#
Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:
Have you tried configuring R with Goto's BLAS
                http://www.cs.utexas.edu/users/kgoto/

I haven't worked with Opteron or Athlon64 computers but I understand
that Goto's BLAS are very effective on those machines.  Furthermore
Goto's BLAS are (only) available as .so libraries so you don't need to
mess with creating the .so version.
#
Douglas Bates <bates@stat.wisc.edu> writes:
I tried it, yes. Somewhat to my surprise, it seemed to be not quite as
fast as the threaded ATLAS, but I wasn't very systematic about the
benchmarking.

(and the Goto items have license issues, which get in the way for
binary distributions.)
#
Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:
They have indicated that future releases may have a LGPL licence.
1 day later
#
PD> Douglas Bates <bates@stat.wisc.edu> writes:
    >> Have you tried configuring R with Goto's BLAS
    >> http://www.cs.utexas.edu/users/kgoto/
    >> 
    >> I haven't worked with Opteron or Athlon64 computers but I understand
    >> that Goto's BLAS are very effective on those machines.  Furthermore
    >> Goto's BLAS are (only) available as .so libraries so you don't need to
    >> mess with creating the .so version.

    PD> I tried it, yes. Somewhat to my surprise, it seemed to be not quite as
    PD> fast as the threaded ATLAS, but I wasn't very systematic about the
    PD> benchmarking.

    PD> (and the Goto items have license issues, which get in the way for
    PD> binary distributions.)

Thanks a lot, Peter, Brian, Doug, for your feedbacks!
In the mean time, I have three running versions of R(-devel) on
the 64-Opteron
- "plain"
- linked against threaded GOTO
- linked against threaded (static) ATLAS  (using -fPIC for compilation;
					   "large" Rlapack)
and I find that GOTO is faster than ATLAS
consistently (between ~ 5-20%) for several tests
(square matrices; %*% and solve).
ATLAS is still an order of magnitude faster than "plain" for
3000x3000 matrices.

Here are somewhat repeatable "ATLAS for R" build instructions:

 1. get ATLAS source; unpack
 2. make : use defaults and "express" installation
 3. Before "make install ...", edit the  Make.<ARCHITECTURE> file:
    add "-fPIC" to three places, namely  F77FLAGS, CCFLAG0, and MMFLAGS:
    which in case of the "threaded Opteron" architecture, leads to
    the three new lines
       F77FLAGS = -fPIC -fomit-frame-pointer -O -m64

	CCFLAG0 = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64

	MMFLAGS = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
    in the file   Make.Linux_HAMMER64SSE2_2

 4. make install arch=Linux_HAMMER64SSE2_2

 5. Sym.link the ATLAS libraries into /usr/local/lib:

    cd /usr/local/lib
    ln -s <ATLAS_build_dir>/lib/Linux_HAMMER64SSE2_2/lib* .

 6. (needed for runtime!):
    Use environment variable LD_LIBRARY_PATH=/usr/local/lib


Note that I haven't built *.so (shared) libraries yet.

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
#
Martin Maechler <maechler@stat.math.ethz.ch> writes:
Would you be willing to post a brief summary of comparative timings?

I have thought at times that it may be worthwhile collecting
comparative timings for different combinations of
                 processor/OS/memory size and speed/
on "typical" tasks in R.  As with any benchmark the results will
artificial but they can be of some help when considering what hardware
to purchase.  Bioconductor users may find it particularly helpful to
be able to evaluate how much they will need to pay to be able to
analyze large data sets reasonably quickly.

One easily-obtained timing is at the end of
$RSRC/tests/Examples/base-Ex.Rout after 'make;make check'.
#
On 27 Feb 2004, Douglas Bates wrote:

            
That one is I think rather too artificial, as it contains few even
moderately large examples, and is dominated by a few atypical tasks.

I tend to use the sum of the MASS scripts as an informal timing: ch06.R is 
also a pretty good indicator.

I think you will find that BLAS differences are pretty small in real-life 
analyses, or at least I always have.