Skip to content

Precision of function mean,bug?

4 messages · Morgan Morgan, Peter Dalgaard, brodie gaslam +1 more

#
Hello R-dev,

Yesterday, while I was testing the newly implemented function pmean in
package kit, I noticed a mismatch in the output of the below R expressions.

set.seed(123)
n=1e3L
idx=5
x=rnorm(n)
y=rnorm(n)
z=rnorm(n)
a=(x[idx]+y[idx]+z[idx])/3
b=mean(c(x[idx],y[idx],z[idx]))
a==b
# [1] FALSE

For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many
others the difference is small but still.
Is that expected or is it a bug?

Thank you
Best Regards
Morgan Jacob
#
Expected, see FAQ 7.31.

You just can't trust == on FP operations. Notice also
[1] FALSE
[1] TRUE

-pd

  
    
#
Additionally, since you're implementing a "mean" function you are testing
against R's mean, you might want to consider that R uses a two-pass
calculation[1] to reduce floating point precision error.

Best,

Brodie.

[1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482
#
On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel
<r-devel at r-project.org> wrote:
This one is important.

FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to
calculate mean with and without this two-pass calculation;
[1] FALSE
[1] 2.220446e-16
[1] TRUE
[1] 0
[1] TRUE
[1] 0
[1] FALSE
[1] 2.220446e-16

Not surprisingly, the two-pass higher-precision version (refine=TRUE)
takes roughly twice as long as the one-pass quick version
(refine=FALSE).

/Henrik