Skip to content

result of mean(v1, v2, v3) of three real number not the same as sum(v1, v2, v3)/3

14 messages · Ivan Krylov, David Stevens, Eric Berger +7 more

#
I have a very strange problem. I am getting different results from 
mean(mlagFZ1,mlagFZ2,mlagFZ3)
vs. 
 sum(mlagFZ1,mlagFZ2,mlagFZ3)/3
[1] -0.3326792
[1] -0.201942

R code:
print(mlagFZ1)
print(mlagFZ2)
print(mlagFZ3)
 sum(mlagFZ1,mlagFZ2,mlagFZ3)/3
mean(mlagFZ1,mlagFZ2,mlagFZ3)

output;
[1] -0.3326792
[1] -0.1890601
[1] -0.0840866
[1] -0.201942
[1] -0.3326792

Can someone tell me what I did wrong?
#
On Thu, 12 May 2022 19:31:51 +0000
"Sorkin, John" <jsorkin at som.umaryland.edu> wrote:

            
match.call(mean.default, quote(mean(mlagFZ1, mlagFZ2, mlagFZ3)))
# mean(x = mlagFZ1, trim = mlagFZ2, na.rm = mlagFZ3)

mean() takes a vector to compute a mean of and additional arguments,
unlike sum(), which takes ... (almost arbitrary arguments) and sums all
of them. Unfortunately, the "trim" argument is documented to accept any
number and the na.rm argument is silently reinterpreted as logical in
your case.
#
John - for both mean and sum, the numbers to be operated on need to be 
in a vector. This can be done with the 'c(...)' function, which takes 
the arguments (your three numbers) and concatenates them into a vector.

 > mean(c(mlagFZ1,mlagFZ2,mlagFZ3))
[1] 0.01984417

 > sum(c(mlagFZ1,mlagFZ2,mlagFZ3))/3
[1] 0.01984417

What you have done is provide the mean (and sum) functions with three 
arguments. If you look at the documentation for 'mean', you have mlagFZ1 
as the 'x' argument, mlagFZ2 as the 'trim' argument, and mlagFZ3 as the 
'na.rm' argument, and makes no sense. With slightly different details, 
the same applies to the 'sum' function.

mean {base}??? R Documentation
Arithmetic Mean
Description
Generic function for the (trimmed) arithmetic mean.

Usage
mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)
Arguments
x
An R object. Currently there are methods for numeric/logical vectors and 
date, date-time and time interval objects. Complex vectors are allowed 
for trim = 0, only.

trim
the fraction (0 to 0.5) of observations to be trimmed from each end of x 
before the mean is computed. Values of trim outside that range are taken 
as the nearest endpoint.

na.rm
a logical value indicating whether NA values should be stripped before 
the computation proceeds.
On 5/12/2022 1:31 PM, Sorkin, John wrote:
#
Or ... to put it differently,, you need to wrap the numbers in c( .., ..,  )
On Thu, May 12, 2022 at 10:42 PM Ivan Krylov <krylov.r00t at gmail.com> wrote:

            

  
  
#
Ivan,
Thank you for your quick reply. Unfortunately, I don't understand the concept you are trying toe explain. Can you try again?
Thank you,
John
#
Hi John,

sum() takes an arbitrary "..." list as the initial argument, while mean() takes a vector "x" as the initial named argument.

Thus:
[1] -0.201942
[1] -0.201942


Regards,

Marc Schwartz
On May 12, 2022 at 3:31:51 PM, Sorkin, John (jsorkin at som.umaryland.edu (mailto:jsorkin at som.umaryland.edu)) wrote:

            
#
you wrote
 mean(mlagFZ1,mlagFZ2,mlagFZ3)

you intended to write
 mean(c(mlagFZ1,mlagFZ2,mlagFZ3))
#
Eric Berger and Marc Schwartz and David K Stevens probably said it
better. I was trying to illustrate the way mean() takes its arguments
using the match.call function.

The sum() function can take individual numbers or vectors and
sum all their elements, so sum(c(1, 2, 3)) is the same as sum(1, 2, 3),
or even sum(c(1, 2), 3): they all do what you mean them to do.

The mean() function is different. It may accept many arguments, but
only the first of them is the vector of numbers you're interested in:
mean(c(1, 2, 3)) is the correct way to call it. Unfortunately, when you
give it more arguments and they aren't what mean() expects them to be
(the second one should be a number in [0; 0.5] and the third one should
be TRUE or FALSE, see help(mean) if you're curious), R doesn't warn you
or raise an error condition.

My use of match.call() was supposed to show that by calling mean(a, b,
c), I pass the number "b" as the "trim" argument to mean() and the
number "c" as the "na.rm" argument to mean(), which is not what was
intended here.
#
There's actually another reason why mean(x) and sum(x)/length(x) may
differ, e.g.

x <- c(rnorm(1e6, sd=.Machine$double.eps), rnorm(1e6, sd=1))
mean(x) - sum(x)/length(x)
#> [1] 1.011781e-18

The mean() function calculates the sample mean using a two pass scan
through the data.  The first scan calculates the total sum and divides
by the number of (non-missing) values. In the second scan, this
average is refined by adding the residuals towards the first average.
This way numerical precision of mean(x) is higher than
sum(x)/length(x) when there spread of 'x' is large.  It also means
that the processing time of mean(x) is roughly twice that of
sum(x)/length(x).

/Henrik
On Thu, May 12, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at gmail.com> wrote:
#
I thank Richard Heiberger, Marc Schwartz, Eric Berger, Ivan Krylov, and David Stevens for answering my question regarding different results obtained from mean(v1,v2,v3)) and sum(v1,v2,v3)/3

I believe the explanations points out a  possibly dangerous aspect of the sum vs mean functions. Mean may be used improperly!
The help file for 
sum says:
. . . numeric or complex or logical vectors

The help for mean says:
x, An R object, Currently there are methods for numeric/logical vectors, date, date/time/ and interval objects.

While the help file for mean explains that mean expects its parameters to be a vector, it is reasonable for an R user to assume that mean works on a series of scalers, i.e. v1,v2,v3 rather than requiring a vector i.e. c(v1,v2,v3). 

Should the help file for mean be modified so that it checks to see if its parameter is a vector? If the parameter is not a vector, an error message might be generated. Without this, someone unsuspecting user (who does not check and think about results) might blindly accept the output of mean, use it to build a bridge, or skyscraper, which fails and result in the death of innocent victims. 

John
#
R is at heart used for processing vectors of data. I find the flexibility of sum to be more disturbing than the constraints of mean.
On May 12, 2022 1:55:00 PM PDT, "Sorkin, John" <jsorkin at som.umaryland.edu> wrote:

  
    
#
I believe you are still misunderstanding. Inline comments below.
On Thu, May 12, 2022 at 2:02 PM Sorkin, John <jsorkin at som.umaryland.edu> wrote:
Well, **any** function "may be used improperly"! It's up to the user
to study and follow the documentation.
You need to pay closer attention: mean() is a (S3) generic function
whose first argument, x, is an **object** that determines dispatch.
This object is not necessarily a vector -- that is,
inherits(x,"vector") need **not** be TRUE( and indeed is FALSE for
various methods. See methods('mean') ).  If you do not understand
this, you need to read up on R's S3 generic function mechanism.
**That is false. ** It is only true for the 'x' parameter for the
**default** method.
??  Based on what I noted above, this query seems misbegotten.
Moreover, of course, a help file doesn't check anything -- the
**function** would have to check its arguments (I assume that's what
you meant).

 -- Bert
#
I think, in addition to earlier replies, that there are effects in R based on just about?everything being a vector. The number 1 is a vector of length 1 and is not really?different than a longer one like c(1,2,3) except for having a length of 1 versus 3.
So sum(1:3, 3:1, 1) adds to 13 as it is seen as sum(1, 2, 3, 3, 3, 1, 1) in a sense.

Your example did not trip up an error but this does:?mean(1:3, 7:9)
Error in mean.default(1:3, 7:9) : 'trim' must be numeric of length one

There is the hint because the optional trim= argument can only work with a?single scalar. I gave it a vector of c(7,8,9) and it barfed. Other R functionality?often just supplies a warning when it gets a vector when it expected a scalar albeit?they look the same otherwise. I mean if(... vector) may now start considering it an error.
How would you rewrite mean to catch odd cases? I can understand it assuming a second?argument to be for trim and a third argument for na.rm but it does not check for it to be "TRUE"?and accepts non-zero numbers as a version of na.rm=TRUE.
But anything with 4 or more arguments ought to be visibly WRONG but is apparently?blindly accepted as?mean(1,2,3,4,5,6,7,8,9) returns the wrong answer of 1 with no complaint.
I am a fan of not using keyword optional arguments without a keyword, even if it is legal.
If your function insisted on being called like:
mean(num, trim=num, na.rm=TRUE)
then it could easily catch errors like the one we are discussing.
I note median() is also similarly flawed now. It only has one documented argument ad still?does not catch getting 3 or more arguments. sd() fails on sd(3,3) with an NA but only because?sd(3) is equally NA. But for more arguments like?sd(3, 4, 5), it fails because of unused arguments.
So there are some inconsistencies that catch some errors and not others but a complete "fix"?may not work as long as R allows vectors of length 1 in all contexts.
Making optional arguments required to be spelled out would much existing code too.
For now, just using c() in solutions work well.?

-----Original Message-----
From: Henrik Bengtsson <henrik.bengtsson at gmail.com>
To: Ivan Krylov <krylov.r00t at gmail.com>
Cc: r-help at r-project.org <r-help at r-project.org>
Sent: Thu, May 12, 2022 4:39 pm
Subject: Re: [R] result of mean(v1, v2, v3) of three real number not the same as sum(v1, v2, v3)/3

There's actually another reason why mean(x) and sum(x)/length(x) may
differ, e.g.

x <- c(rnorm(1e6, sd=.Machine$double.eps), rnorm(1e6, sd=1))
mean(x) - sum(x)/length(x)
#> [1] 1.011781e-18

The mean() function calculates the sample mean using a two pass scan
through the data.? The first scan calculates the total sum and divides
by the number of (non-missing) values. In the second scan, this
average is refined by adding the residuals towards the first average.
This way numerical precision of mean(x) is higher than
sum(x)/length(x) when there spread of 'x' is large.? It also means
that the processing time of mean(x) is roughly twice that of
sum(x)/length(x).

/Henrik
On Thu, May 12, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at gmail.com> wrote:
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Good point, Bert. On my machine,?methods("mean") returns 8 variations including for Dates.
But this being R, if you LIKE having a version of mean() that does it your way, go for it!

mymean <- function(...) mean(c(...))

The function defined above takes any number of arguments and makes a vector of them?and passes it to mean() but note will not handle arguments about trim or na.rm.

mymean(1,2,3,4)

[1] 2.5

There are subtle errors if you try. using na.rm=TRUE has a term evaluated to 1. Using?trim=8 so you cannot mix and match just because you feel like it. Yes, there are ways?to evaluate the arguments and package some into a c() and pass along others but my?point is what others keep saying. Don't assume a function can be used the way you?want it to but the way the manual page says it can be used. Sometimes, though, you can?make your own function for personal use that does what you want. You don't even have?to piggyback like my example. Your function can take any number of arguments, evaluate?each one to see if it seems numeric and has length 1 and is not NA, calculate how many?it got that are valid,??add them together in a loop and divide by the total.
But you may not want to name it mean() unless ...
-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
To: Sorkin, John <jsorkin at som.umaryland.edu>
Cc: r-help at r-project.org <r-help at r-project.org>; Schwartz, Marc (MSchwartz at mn.rr.com) <mschwartz at mn.rr.com>
Sent: Thu, May 12, 2022 5:45 pm
Subject: Re: [R] [External] result of mean(v1, v2, v3) of three real number not the same as sum(v1, v2, v3)/3

I believe you are still misunderstanding. Inline comments below.
On Thu, May 12, 2022 at 2:02 PM Sorkin, John <jsorkin at som.umaryland.edu> wrote:
Well, **any** function "may be used improperly"! It's up to the user
to study and follow the documentation.
You need to pay closer attention: mean() is a (S3) generic function
whose first argument, x, is an **object** that determines dispatch.
This object is not necessarily a vector -- that is,
inherits(x,"vector") need **not** be TRUE( and indeed is FALSE for
various methods. See methods('mean') ).? If you do not understand
this, you need to read up on R's S3 generic function mechanism.
**That is false. ** It is only true for the 'x' parameter for the
**default** method.
??? Based on what I noted above, this query seems misbegotten.
Moreover, of course, a help file doesn't check anything -- the
**function** would have to check its arguments (I assume that's what
you meant).

 -- Bert

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.