Non identical numerical results from R code vs C/C++ code?
On Fri, Sep 10, 2010 at 11:46 AM, Renaud Gaujoux
<renaud at mancala.cbio.uct.ac.za> wrote:
Hi, suppose you have two versions of the same algorithm: one in pure R, the other one in C/C++ called via .Call(). Assuming there is no bug in the implementations (i.e. they both do the same thing), is there any well known reason why the C/C++ implementation could return numerical results non identical to the one obtained from the pure R code? (e.g. could it be rounding errors? please explain.) Has anybody had a similar experience? By not identical, I mean very small differences (< 2.4 e-14), but enough to have identical() returning FALSE. Maybe I should not bother, but I want to be sure where the differences come from, at least by mere curiosity. Briefly the R code perform multiple matrix product; the C code is an optimization of those specific products via custom for loops, where entries are not computed in the same order, etc... which improves both memory usage and speed. The result is theoretically the same.
I've had almost exactly this situation recently with an algorithm I first implemented in R and then in C. Guess what the problem was? Yes, a bug in the C code. At first I thought everything was okay because the returned values were close-ish, and I thought 'oh, rounding error', but I wasn't happy about that. So I dug a bit deeper. This is worth doing even if you are sure there no bugs in the C code (remember: there's always one more bug). First, construct a minimal dataset so you can demonstrate the problem with a manageable size of matrix. I went down to 7 data points. Then get the C to print out its inputs. Identical, and as expected? Good. Now debug intermediate calculations, printing or saving from C and checking they are the same as the intermediate calculations from R. If possible, try returning intermediate calculations in C and checking identical() with R intermediates. Eventually you will find out where the first diverge - and this is the important bit. It's no use just knowing the final answers come out different, it's likely your answer has a sensitive dependence on initial conditions. You need to track down when the first bits are different. Or it could be rounding error... Barry