Matrix issues when building R with znver3 architecture under GCC 11

Hi Kieran,
Hello,

I'm new to this list, and have subscribed particularly because I've come
across an issue with building R from source with an AMD-based Zen
architecture under GCC11. Please don't attack me for my linux operating
system choice, but it is Ubuntu 20.04 with Linux Kernel 5.10.102.1 -
microsoft-standard-WSL2. I've built GCC11 using GCC8 (the standard GCC
under Ubuntu20.04 WSL release), under Windows11 with wslg. WSL2/g runs as a
hypervisor with ports to all system resources including display, GPU (cuda,
etc).

The reason why I am posting this email is that I am trying to compile R
using the AMD Zen3 platform architecture rather than x86/64, because it has
processor-specific optimizations that improve performance over the standard
x86/64 in benchmarks. The Zen3 architecture optimizations are not available
in earlier versions of GCC (actually, they have possibly been backported to
GCC10 now). Since Ubuntu 20.04 doesn't have GCC11, I compiled the GCC11
compiler using the native GCC8.

The GCC11 I have built can build R 4.1.3 with a standard x86-64
architecture and pass all tests with "make check-all".
I configured that with:
~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
CXXFLAGS="-O3 -march=x86-64" CFLAGS="-O3 -march=x86-64" FFLAGS="-O3
-march=x86-64" --enable-memory-profiling --enable-R-shlib
and built with
make -j 32 -O
make check-all
## PASS.

So I can build R in my environment with GCC11.
In configure, I am using references to "gcc-11.2" "gfortran-11.2" and
"g++-11.2" because I compiled GCC11 compilers with these suffixes.

Now, I'm using a 32 thread (16 core) AMD Zen3 CPU (a 5950x), and want to
use it to its full potential. Zen3 optimizations are available as a
-march=znver3 option n GCC11. The znver3 optimizations improve performance
in Phoronix Test Suite benchmarks (I'm not aware of anyone that has
compiled R with them). See:
https://www.phoronix.com/scan.php?page=article&item=amd-5950x-gcc11

However, the R 4.1.3 build (made with "make -j 32 -O"), configured with
-march=znver3, produces an R that fails "make check-all".

~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
CXXFLAGS="-O2 -march=znver3" CFLAGS="-O2 -march=znver3" FFLAGS="-O2
-march=znver3" --enable-memory-profiling --enable-R-shlib
or
~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
CXXFLAGS="-O3 -march=znver3" CFLAGS="-O3 -march=znver3" FFLAGS="-O3
-march=znver3" --enable-memory-profiling --enable-R-shlib

The fail is always in the factorizing.R Matrix.R tests, and in particular,
there are a number of errors and a fatal error.
I have attached the output because I cannot really understand what is going
wrong. But results returned from matrix calculations are obviously odd with
-march=znver3 in GCC 11. There is another backwards-compatible architecture
option "znver2" and this has EXACTLY the same result.

While there are other warrnings and errors (many in assert.EQ() ), the
factorizing.R script continues. The fatal error (at line 2662 in the
attached factorizing.Rout.fail text file) is:

## problematic rank deficient rankMatrix() case -- only seen in large
cases ??
Z. <- readRDS(system.file("external", "Z_NA_rnk.rds", package="Matrix"))
tools::assertWarning(rnkZ. <- rankMatrix(Z., method = "qr")) # gave errors
Error in assertCondition(expr, classes, .exprString = d.expr) :
   Failed to get warning in evaluating rnkZ. <- rankMatrix(Z., method  ...
Calls: <Anonymous> -> assertCondition
Execution halted

Can anybody shed light on what might be going on here? 'make check-all'
passes all the other checks. It is just factorizing.R in Matrix that fails
(other matrix tests run ok).
Sorry this is a bit long-winded, but I thought details might be important.
R gets used and tested most with the default optimizations, without use 
of model-specific instructions and with -O2 (GCC). It happens time to 
time that some people try other optimization options and run into 
problems. In principle, there are these cases (seen before):

(1) the test in R package (or R) is wrong - it (unintentionally) expects 
behavior which has been observed in builds with default optimizations, 
but is not necessarily the only correct one; in case of numerical 
tolerances set empirically, they could simply be too tight

(2) the algorithm in R package or R has a bug - the result is really 
wrong and it is because the algorithm is (unintentionally) not portable 
enough, it (unintentionally) only works with default optimizations or 
lower; in case of numerical results, this can be because it expects more 
precision from the floating point computations than mandated by IEEE, or 
assumes behavior not mandated

(3) the optimization by design violates some properties the algorithm 
knowingly depends on; with numerical computations, this can be a sort of 
"fast" (and similarly referred to) mode which violates IEEE floating 
point standard by design, in the aim of better performance; due to the 
nature of the algorithm depending on IEEE, and poor luck, the results 
end up completely wrong

(4) there is a bug in the C or Fortran compiler (GCC as we use GCC) that 
only exhibits with the unusual optimizations; the compiler produces 
wrong code

So, when you run into a problem like this and want to get that fixed, 
the first thing is to identify which case of the above it is, in case of 
1 and 2 also differentiate between base R and a package (and which 
concrete package). Different people maintain these things and you would 
ideally narrow down the problem to a very small, isolated, reproducible 
example to support your claim where the bug is. If you do this right, 
the problem can often get fixed very fast.

Such an example for (1) could be: few lines of standalone R code using 
Matrix that produces correct results, but the test is not happy. With 
pointers to the real check in the tests that is wrong. And an 
explanation why the result is wrong.

For (2)-(4) it would be a minimal standalone C/Fortran example including 
only the critical function/part of algorithm that is not correct/not 
portable/not compiled correctly, with results obtained with 
optimizations where it works and where it doesn't. Unless you find an 
obvious bug in R easy to explain (2), when the example would not have to 
be standalone. With such standalone C example, you could easily test the 
results with different optimizations and compilers, it is easier to 
analyze, and easier to produce a bug report for GCC. What would make it 
harder in this case is that it needs special hardware, but you could 
still try with the example, and worry about that later (one option is 
running in an emulator, and again a standalone example really helps 
here). In principle, as it needs special hardware, the chances someone 
else would do this work is smaller. Indeed, if it turns out to be (3), 
it is unlikely to get resolved, but at least would get isolated (you 
would know what not to run).

As a user, if you run into a problem like this and do not want to get it 
fixed, but just work it around somehow. First, it may be dangerous, 
possibly one would get incorrect results from computations, but say in 
applications where they are verified externally. You could try disabling 
individual specific optimization until the tests pass. You could try 
with later versions of gcc-11 (even unreleased) or gcc-12. Still, a lot 
of this is easier with a small example, too. You could ignore the 
failing test. And it may not be worth it - it may be that you could get 
your speedups in a different, but more reliable way.

Using wsl2 on its own should not necessarily be a problem and the way 
you built gcc from the description should be ok, but at some point it 
would be worth checking under Linux and running natively - because even 
if these are numerical differences, they could be in principle caused by 
running on Windows (or in wsl2), at least in the past such differences 
were seen (related to (2) above). I would recommend checking on Linux 
natively once you have at least a standalone R example.

Best
Tomas
best regards,
Kieran
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Matrix issues when building R with znver3 architecture under GCC 11

Thread (8 messages)