Skip to content

daxpy performance with veclib

3 messages · Simon Urbanek, Michael Spiegel

#
Hi Mac Special Interest Group folks,

We've noticed some curious behavior of the veclib BLAS implementation
in the development of the OpenMx library.  The daxpy implementation
appears to be twice as slow in the veclib implementation as compared
to the reference implementation.  Attached is a test kernel that has
been run under both implementations.  The kernel consists of repeated
calls to daxpy with vectors of varying size.  In the output files, the
first column is the dimension of the vector.  The 2nd-4th column
report the runtime in seconds of the kernel; three identical trials
per vector size.

It may be more appropriate to send this information upstream to the
veclib persons.  However, I thought it would be of interest to Mac R
folks, too.  For our own project, our workaround will be to create our
own basic implementation of daxpy, and continue to link against the
veclib BLAS library so we can get a speedup on dgemm and the other
functions.

The benchmarks were executed on a Mac Pro with 2 Quad-Core Xeons @ 3
GHz (MacPro2,1) running OS X 10.5.8.  It was tested with R 2.12.0 and
the same behavior has been observed with R 2.10.1.

Thanks,
--Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omxTest.c
Type: text/x-csrc
Size: 1022 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20101020/fdde1502/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: daxpy.veclib.results
Type: application/octet-stream
Size: 677 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20101020/fdde1502/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: daxpy.refblas.results
Type: application/octet-stream
Size: 659 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20101020/fdde1502/attachment-0001.obj>
#
On Oct 20, 2010, at 2:18 PM, Michael Spiegel wrote:

            
Yes, that is a known problem (see R-SIG-Mac archives). vecLib fails to use multiple cores on Nehalem-based Mac Pros. If you use ATLAS directly it will work just fine - that's what we recommend on Mac Pros.

Cheers,
Simon
#
Ah, thanks for the heads up.  It turns out that the problem is
somewhat worse than as you have described.  So not only is vecLib
single threaded on some of BLAS functions (I've noticed that dgemm is
multithreaded), but the vecLib single threaded implementation worse
than the single threaded reference implementation.  I'll follow your
suggestion and give ATLAS a try.  I had poor luck with goto BLAS, our
performance test suite was slower using that implementation.

On Wed, Oct 20, 2010 at 4:29 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote: