dear r-devel, This has probably been forever like this but is this satisfying ? all.equal(c(1,NA,NA), c(1,NA,3)) #> [1] "'is.NA' value mismatch: 1 in current 2 in target" is.NA() doesn't exist (is.na() does), and is.na() is never 1 or 2. In this example it's obvious that we're counting missing values, in a general situation I believe it isn't (we might understand it as the position of the first NA for instance). I would expect something like "'amount of missing values mismatch: 1 in current 2 in target" Thanks, Antoine
confusing all.equal output
6 messages · Antoine Fabri, @vi@e@gross m@iii@g oii gm@ii@com, Peter Dalgaard +1 more
1 day later
Yes... Also, of course, the sentence after colon does not the describe the cause of the mismatch, e.g.
all.equal(c(1,NA,NA), c(NA,NA,3))
[1] "'is.NA' value mismatch: 2 in current 2 in target" could be confusing. Perhaps "is.na() mismatch (2 positions)", with the count calculated as sum(is.na(current) != is.na(target)) instead? Or you could give both off-diagonal elements of the confusion matrix: "target-only: 1, current-only: 1" but actually, the whole current/target terminology is somewhat unclear. -pd
On 1 Mar 2023, at 13:53 , Antoine Fabri <antoine.fabri at gmail.com> wrote: dear r-devel, This has probably been forever like this but is this satisfying ? all.equal(c(1,NA,NA), c(1,NA,3)) #> [1] "'is.NA' value mismatch: 1 in current 2 in target" is.NA() doesn't exist (is.na() does), and is.na() is never 1 or 2. In this example it's obvious that we're counting missing values, in a general situation I believe it isn't (we might understand it as the position of the first NA for instance). I would expect something like "'amount of missing values mismatch: 1 in current 2 in target" Thanks, Antoine [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Good points. I don't mind the terminology since target and current are the names of the arguments. As the function is already designed to stop at the first failing check we might not need to enumerate or count the mismatches, instead we could have "`NA` found in `target` but not in `current` at position <FIRST_MISMATCH>"
I think if you step back, you can ask what the purpose of an error message is and who designs it. Is the message for the developer or others on their team or something an end-user knowing nothing about R will see. This reminds me a bit of legal mumbo jumbo that turns many reading it off as it keeps talking about the party of the first part or the plaintiff as compared to somewhat straighter talk. The scenario is that you are comparing two things. Their names are not things like "target" or "current" so even other programmers not involved in your code will pause and wonder. One view is to use phrases like first and second arguments/lists/whatever. You might talk about the one on the left (but using LHS is a bit opaque) versus the one on the right. But sometimes it can be too verbose. Sometimes the error message is being generated not where everything is clear. So ideally you could say: WARNING Danger Will Robinson. Comparing two things for equality. Result finds mismatches. There were NA found on the (left or right) that were not matched on the other side. Number of such found: 2 If you had a Systems Engineer write detailed requirements that included something a bit better than the example and the programmer was able to supply the data using the words and guidelines, it might fit some needs but maybe not satisfy other programmers. But there are human factors people whose job it is to help choose among alternatives and although they may not choose well, letting a programmer come up with whatever they feel like is generally worse. Yes, in their microcosm centered on a dozen lines of code, "current" and "target" may have meaning. But are they the intended user of the product? -----Original Message----- From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Antoine Fabri Sent: Thursday, March 2, 2023 12:23 PM To: peter dalgaard <pdalgd at gmail.com> Cc: R-devel <r-devel at r-project.org> Subject: Re: [Rd] confusing all.equal output Good points. I don't mind the terminology since target and current are the names of the arguments. As the function is already designed to stop at the first failing check we might not need to enumerate or count the mismatches, instead we could have "`NA` found in `target` but not in `current` at position <FIRST_MISMATCH>" ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I believe the wording goes back to Martin Maechler many moons ago (AFAICT towards the end of the last millennium.) We might leave it to him to change it? - Peter D.
On 2 Mar 2023, at 19:30 , avi.e.gross at gmail.com wrote: I think if you step back, you can ask what the purpose of an error message is and who designs it. Is the message for the developer or others on their team or something an end-user knowing nothing about R will see. This reminds me a bit of legal mumbo jumbo that turns many reading it off as it keeps talking about the party of the first part or the plaintiff as compared to somewhat straighter talk. The scenario is that you are comparing two things. Their names are not things like "target" or "current" so even other programmers not involved in your code will pause and wonder. One view is to use phrases like first and second arguments/lists/whatever. You might talk about the one on the left (but using LHS is a bit opaque) versus the one on the right. But sometimes it can be too verbose. Sometimes the error message is being generated not where everything is clear. So ideally you could say: WARNING Danger Will Robinson. Comparing two things for equality. Result finds mismatches. There were NA found on the (left or right) that were not matched on the other side. Number of such found: 2 If you had a Systems Engineer write detailed requirements that included something a bit better than the example and the programmer was able to supply the data using the words and guidelines, it might fit some needs but maybe not satisfy other programmers. But there are human factors people whose job it is to help choose among alternatives and although they may not choose well, letting a programmer come up with whatever they feel like is generally worse. Yes, in their microcosm centered on a dozen lines of code, "current" and "target" may have meaning. But are they the intended user of the product? -----Original Message----- From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Antoine Fabri Sent: Thursday, March 2, 2023 12:23 PM To: peter dalgaard <pdalgd at gmail.com> Cc: R-devel <r-devel at r-project.org> Subject: Re: [Rd] confusing all.equal output Good points. I don't mind the terminology since target and current are the names of the arguments. As the function is already designed to stop at the first failing check we might not need to enumerate or count the mismatches, instead we could have "`NA` found in `target` but not in `current` at position <FIRST_MISMATCH>" [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
peter dalgaard
on Thu, 2 Mar 2023 19:47:59 +0100 writes:
> I believe the wording goes back to Martin Maechler many
> moons ago (AFAICT towards the end of the last millennium.)
> We might leave it to him to change it?
> - Peter D.
Thank you, Peter.
Yes, this is *very* old. I could claim that R users seem to get
more and more confused over time, because nobody had ever
complained for a quarter of a century .. (;-) ;-)
I know I had been inspired by the all.equal() implementation of
S-PLUS version 3.x (x = 4, IIRC) at the time, but then I also think
that I have to take the "full blame" on this :
Trying to think like myself "yesterday, when I was young ..",
I guess the argumentation for using is.NA was what I
considered helpful to the non experienced S / R user at the time:
Everybody has seen 'NA' before (and they see it in their objects
in this case) but only somewhat more experienced useRs would
know about is.na(). .. and it may be that at the time I found it
"slick" to combine the "NA" and "is.na" into "is.NA" ...
About the other wording and how the mismatches should be counted, I
have no recollection.
But indeed, already in 1999, i.e., before R 1.0.0 existed,
that part of the code was
out <- is.na(target)
if(any(out != is.na(current)))
return(paste("`is.NA' value mismatches:", sum(is.na(current)),
"in current,", sum(out), " in target"))
- - -
Ok, now I need to work to commit a (completely orthogonal) change to
all.equal.numeric() which had been lying around with me for
about a year at least... so I can start looking at your proposed
changes ...
Martin
>> On 2 Mar 2023, at 19:30 , avi.e.gross at gmail.com wrote:
>>
>> I think if you step back, you can ask what the purpose of
>> an error message is and who designs it.
>>
>> Is the message for the developer or others on their team
>> or something an end-user knowing nothing about R will
>> see.
>>
>> This reminds me a bit of legal mumbo jumbo that turns
>> many reading it off as it keeps talking about the party
>> of the first part or the plaintiff as compared to
>> somewhat straighter talk.
>>
>> The scenario is that you are comparing two things. Their
>> names are not things like "target" or "current" so even
>> other programmers not involved in your code will pause
>> and wonder.
>>
>> One view is to use phrases like first and second
>> arguments/lists/whatever. You might talk about the one
>> on the left (but using LHS is a bit opaque) versus the
>> one on the right.
>>
>> But sometimes it can be too verbose. Sometimes the error
>> message is being generated not where everything is clear.
>>
>> So ideally you could say:
>>
>> WARNING Danger Will Robinson. Comparing two things for
>> equality. Result finds mismatches. There were NA found
>> on the (left or right) that were not matched on the other
>> side. Number of such found: 2
>>
>> If you had a Systems Engineer write detailed requirements
>> that included something a bit better than the example and
>> the programmer was able to supply the data using the
>> words and guidelines, it might fit some needs but maybe
>> not satisfy other programmers. But there are human
>> factors people whose job it is to help choose among
>> alternatives and although they may not choose well,
>> letting a programmer come up with whatever they feel like
>> is generally worse.
>>
>> Yes, in their microcosm centered on a dozen lines of
>> code, "current" and "target" may have meaning. But are
>> they the intended user of the product?
>>
>> -----Original Message----- From: R-devel
>> <r-devel-bounces at r-project.org> On Behalf Of Antoine
>> Fabri Sent: Thursday, March 2, 2023 12:23 PM To: peter
>> dalgaard <pdalgd at gmail.com> Cc: R-devel
>> <r-devel at r-project.org> Subject: Re: [Rd] confusing
>> all.equal output
>>
>> Good points. I don't mind the terminology since target
>> and current are the names of the arguments. As the
>> function is already designed to stop at the first failing
>> check we might not need to enumerate or count the
>> mismatches, instead we could have "`NA` found in `target`
>> but not in `current` at position <FIRST_MISMATCH>"
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> --
> Peter Dalgaard, Professor, Center for Statistics,
> Copenhagen Business School Solbjerg Plads 3, 2000
> Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com