Hello R-dev, Yesterday, while I was testing the newly implemented function pmean in package kit, I noticed a mismatch in the output of the below R expressions. set.seed(123) n=1e3L idx=5 x=rnorm(n) y=rnorm(n) z=rnorm(n) a=(x[idx]+y[idx]+z[idx])/3 b=mean(c(x[idx],y[idx],z[idx])) a==b # [1] FALSE For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many others the difference is small but still. Is that expected or is it a bug? Thank you Best Regards Morgan Jacob
Precision of function mean,bug?
4 messages · Morgan Morgan, Peter Dalgaard, brodie gaslam +1 more
Expected, see FAQ 7.31. You just can't trust == on FP operations. Notice also
a2=(z[idx]+x[idx]+y[idx])/3 a2==a
[1] FALSE
a2==b
[1] TRUE -pd
On 20 May 2020, at 12:40 , Morgan Morgan <morgan.emailbox at gmail.com> wrote: Hello R-dev, Yesterday, while I was testing the newly implemented function pmean in package kit, I noticed a mismatch in the output of the below R expressions. set.seed(123) n=1e3L idx=5 x=rnorm(n) y=rnorm(n) z=rnorm(n) a=(x[idx]+y[idx]+z[idx])/3 b=mean(c(x[idx],y[idx],z[idx])) a==b # [1] FALSE For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many others the difference is small but still. Is that expected or is it a bug? Thank you Best Regards Morgan Jacob [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <pdalgd at gmail.com> wrote: Expected, see FAQ 7.31. You just can't trust == on FP operations. Notice also
Additionally, since you're implementing a "mean" function you are testing against R's mean, you might want to consider that R uses a two-pass calculation[1] to reduce floating point precision error. Best, Brodie. [1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482
a2=(z[idx]+x[idx]+y[idx])/3 a2==a
[1] FALSE
a2==b
[1] TRUE -pd
On 20 May 2020, at 12:40 , Morgan Morgan <morgan.emailbox at gmail.com> wrote: Hello R-dev, Yesterday, while I was testing the newly implemented function pmean in package kit, I noticed a mismatch in the output of the below R expressions. set.seed(123) n=1e3L idx=5 x=rnorm(n) y=rnorm(n) z=rnorm(n) a=(x[idx]+y[idx]+z[idx])/3 b=mean(c(x[idx],y[idx],z[idx])) a==b # [1] FALSE For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many others the difference is small but still. Is that expected or is it a bug?
On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel
<r-devel at r-project.org> wrote:
> On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <pdalgd at gmail.com> wrote: Expected, see FAQ 7.31. You just can't trust == on FP operations. Notice also
Additionally, since you're implementing a "mean" function you are testing against R's mean, you might want to consider that R uses a two-pass calculation[1] to reduce floating point precision error.
This one is important. FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to calculate mean with and without this two-pass calculation;
a <- c(x[idx],y[idx],z[idx]) / 3 b <- mean(c(x[idx],y[idx],z[idx])) b == a
[1] FALSE
b - a
[1] 2.220446e-16
c <- matrixStats::mean2(c(x[idx],y[idx],z[idx])) ## default to refine=TRUE b == c
[1] TRUE
b - c
[1] 0
d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE) a == d
[1] TRUE
a - d
[1] 0
c == d
[1] FALSE
c - d
[1] 2.220446e-16 Not surprisingly, the two-pass higher-precision version (refine=TRUE) takes roughly twice as long as the one-pass quick version (refine=FALSE). /Henrik
a2=(z[idx]+x[idx]+y[idx])/3 a2==a
[1] FALSE
a2==b
[1] TRUE -pd
On 20 May 2020, at 12:40 , Morgan Morgan <morgan.emailbox at gmail.com> wrote: Hello R-dev, Yesterday, while I was testing the newly implemented function pmean in package kit, I noticed a mismatch in the output of the below R expressions. set.seed(123) n=1e3L idx=5 x=rnorm(n) y=rnorm(n) z=rnorm(n) a=(x[idx]+y[idx]+z[idx])/3 b=mean(c(x[idx],y[idx],z[idx])) a==b # [1] FALSE For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many others the difference is small but still. Is that expected or is it a bug?
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel