Hi Mac Special Interest Group folks, We've noticed some curious behavior of the veclib BLAS implementation in the development of the OpenMx library. The daxpy implementation appears to be twice as slow in the veclib implementation as compared to the reference implementation. Attached is a test kernel that has been run under both implementations. The kernel consists of repeated calls to daxpy with vectors of varying size. In the output files, the first column is the dimension of the vector. The 2nd-4th column report the runtime in seconds of the kernel; three identical trials per vector size. It may be more appropriate to send this information upstream to the veclib persons. However, I thought it would be of interest to Mac R folks, too. For our own project, our workaround will be to create our own basic implementation of daxpy, and continue to link against the veclib BLAS library so we can get a speedup on dgemm and the other functions. The benchmarks were executed on a Mac Pro with 2 Quad-Core Xeons @ 3 GHz (MacPro2,1) running OS X 10.5.8. It was tested with R 2.12.0 and the same behavior has been observed with R 2.10.1. Thanks, --Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: omxTest.c Type: text/x-csrc Size: 1022 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20101020/fdde1502/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: daxpy.veclib.results Type: application/octet-stream Size: 677 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20101020/fdde1502/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: daxpy.refblas.results Type: application/octet-stream Size: 659 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mac/attachments/20101020/fdde1502/attachment-0001.obj>
daxpy performance with veclib
3 messages · Simon Urbanek, Michael Spiegel
On Oct 20, 2010, at 2:18 PM, Michael Spiegel wrote:
Hi Mac Special Interest Group folks, We've noticed some curious behavior of the veclib BLAS implementation in the development of the OpenMx library. The daxpy implementation appears to be twice as slow in the veclib implementation as compared to the reference implementation.
Yes, that is a known problem (see R-SIG-Mac archives). vecLib fails to use multiple cores on Nehalem-based Mac Pros. If you use ATLAS directly it will work just fine - that's what we recommend on Mac Pros. Cheers, Simon
Attached is a test kernel that has been run under both implementations. The kernel consists of repeated calls to daxpy with vectors of varying size. In the output files, the first column is the dimension of the vector. The 2nd-4th column report the runtime in seconds of the kernel; three identical trials per vector size. It may be more appropriate to send this information upstream to the veclib persons. However, I thought it would be of interest to Mac R folks, too. For our own project, our workaround will be to create our own basic implementation of daxpy, and continue to link against the veclib BLAS library so we can get a speedup on dgemm and the other functions. The benchmarks were executed on a Mac Pro with 2 Quad-Core Xeons @ 3 GHz (MacPro2,1) running OS X 10.5.8. It was tested with R 2.12.0 and the same behavior has been observed with R 2.10.1. Thanks, --Michael <omxTest.c><daxpy.veclib.results><daxpy.refblas.results>_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at stat.math.ethz.ch
Ah, thanks for the heads up. It turns out that the problem is somewhat worse than as you have described. So not only is vecLib single threaded on some of BLAS functions (I've noticed that dgemm is multithreaded), but the vecLib single threaded implementation worse than the single threaded reference implementation. I'll follow your suggestion and give ATLAS a try. I had poor luck with goto BLAS, our performance test suite was slower using that implementation. On Wed, Oct 20, 2010 at 4:29 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
Yes, that is a known problem (see R-SIG-Mac archives). vecLib fails to use multiple cores on Nehalem-based Mac Pros. If you use ATLAS directly it will work just fine - that's what we recommend on Mac Pros. Cheers, Simon