ATLAS threaded 64 bit Opteron build for R: need -fPIC
On 27 Feb 2004, Douglas Bates wrote:
Martin Maechler <maechler@stat.math.ethz.ch> writes:
"PD" == Peter Dalgaard <p.dalgaard@biostat.ku.dk>
on 26 Feb 2004 15:44:16 +0100 writes:
PD> Douglas Bates <bates@stat.wisc.edu> writes:
>> Have you tried configuring R with Goto's BLAS
>> http://www.cs.utexas.edu/users/kgoto/
>>
>> I haven't worked with Opteron or Athlon64 computers but I understand
>> that Goto's BLAS are very effective on those machines. Furthermore
>> Goto's BLAS are (only) available as .so libraries so you don't need to
>> mess with creating the .so version.
PD> I tried it, yes. Somewhat to my surprise, it seemed to be not quite as
PD> fast as the threaded ATLAS, but I wasn't very systematic about the
PD> benchmarking.
PD> (and the Goto items have license issues, which get in the way for
PD> binary distributions.)
Thanks a lot, Peter, Brian, Doug, for your feedbacks!
In the mean time, I have three running versions of R(-devel) on
the 64-Opteron
- "plain"
- linked against threaded GOTO
- linked against threaded (static) ATLAS (using -fPIC for compilation;
"large" Rlapack)
and I find that GOTO is faster than ATLAS
consistently (between ~ 5-20%) for several tests
(square matrices; %*% and solve).
ATLAS is still an order of magnitude faster than "plain" for
3000x3000 matrices.
Would you be willing to post a brief summary of comparative timings?
I have thought at times that it may be worthwhile collecting
comparative timings for different combinations of
processor/OS/memory size and speed/
on "typical" tasks in R. As with any benchmark the results will
artificial but they can be of some help when considering what hardware
to purchase. Bioconductor users may find it particularly helpful to
be able to evaluate how much they will need to pay to be able to
analyze large data sets reasonably quickly.
One easily-obtained timing is at the end of
$RSRC/tests/Examples/base-Ex.Rout after 'make;make check'.
That one is I think rather too artificial, as it contains few even moderately large examples, and is dominated by a few atypical tasks. I tend to use the sum of the MASS scripts as an informal timing: ch06.R is also a pretty good indicator. I think you will find that BLAS differences are pretty small in real-life analyses, or at least I always have.
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595