All, Consider the code below options(digits=2) x <- 1:1000 quantile(x, .975) The value returned is 975 (the 97.5th percentile), but the name has been shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%". Best, Ed
quantile() names
8 messages · Gabriel Becker, Avi Gross, Ed Merkle +2 more
Hi Edgar, I certainly don't think quantile(x, .975) should return 980, as that is a completely wrong answer. I do agree that it seems like the name is a bit offputting. I'm not sure how deep in the machinery you'd have to go to get digits to no effect on the names (I don't have time to dig in right this second). On the other hand, though, if we're going to make the names not respect digits entirely, what do we do when someone does quantile(x, 1/3)? That'd be a bad time had by all without digits coming to the rescue, i think. Best, ~G On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar C. <merklee at missouri.edu> wrote:
All,
Consider the code below
options(digits=2)
x <- 1:1000
quantile(x, .975)
The value returned is 975 (the 97.5th percentile), but the name has been
shortened to "98%" due to the digits option. Is this intended? I would have
expected the name to also be "97.5%" here. Alternatively, the returned
value might be 980 in order to match the name of "98%".
Best,
Ed
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
The "value" is *not* 975.
It's 975.025.
The results that you're observing, are merely the byproduct of formatting.
Maybe, you should try:
quantile (x, .975, type=4)
Which perhaps, using default options, produces the result you're expecting?
On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. <merklee at missouri.edu> wrote:
All,
Consider the code below
options(digits=2)
x <- 1:1000
quantile(x, .975)
The value returned is 975 (the 97.5th percentile), but the name has been shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%".
Best,
Ed
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Question: is the part that Ed Merkle is asking about the change in the
expected NAME associated with the output?
He changed a sort of global parameter affecting how many digits he wants any
compliant function to display. So when he asked for a named vector, the
chosen name was based on his request and limited when possible to two
digits.
x <- 1:1000
temp <- quantile(x, .975)
If you examine temp, you will see it is a vector containing (as it happens)
a single numeric item (as it happens a double) with the value of 975. But
the name associated is a character string with a "%" appended as shown
below:
str(temp)
Named num 975
- attr(*, "names")= chr "98%"
If you do not want a name attached to the vector, add an option:
quantile(x, .975, names=FALSE)
If you want the name to be longer or different, you can do that after.
names(temp)
[1] "98%"
So change it yourself:
temp
98%
975
names(temp) <- paste(round(temp, 3), "%", sep="")
temp
975.025%
975
The above is for illustration with tabs inserted to show what is in the
output. You probably do not need a name for your purposes and if you ask for
multiple quantiles you might need to adjust the above.
Of course if you wanted another non-default "type" of calculation, what Abby
offered may also apply.
-----Original Message-----
From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Abby Spurdle
Sent: Monday, December 14, 2020 4:48 PM
To: Merkle, Edgar C. <merklee at missouri.edu>
Cc: r-devel at r-project.org
Subject: Re: [Rd] quantile() names
The "value" is *not* 975.
It's 975.025.
The results that you're observing, are merely the byproduct of formatting.
Maybe, you should try:
quantile (x, .975, type=4)
Which perhaps, using default options, produces the result you're expecting?
On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. <merklee at missouri.edu>
wrote:
All, Consider the code below options(digits=2) x <- 1:1000 quantile(x, .975) The value returned is 975 (the 97.5th percentile), but the name has been
shortened to "98%" due to the digits option. Is this intended? I would have expected the name to also be "97.5%" here. Alternatively, the returned value might be 980 in order to match the name of "98%".
Best,
Ed
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Scanned by McAfee and confirmed virus-free. Find out more here: https://bit.ly/2zCJMrO
Avi,
On Mon, 2020-12-14 at 18:00 -0500, Avi Gross wrote:
Question: is the part that Ed Merkle is asking about the change in the
expected NAME associated with the output?
You are right: the question is about the name changing to "98%", when the returned object is the 97.5th percentile.
It is indeed easy to set names=FALSE here. But there can still be a problem when the user sets options(digits=2), then a package calls quantile(x, .975) and expects an object that has a name of "97.5%".
I think the easiest solution is to tell the user not to set options(digits=2), but it also seems like the "98%" name is not the best result. But Gabriel is correct that we would still need to consider how to handle something like quantile(x, 1/3). Maybe it is not a big enough issue to warrant changing anything.
Ed
He changed a sort of global parameter affecting how many digits he wants any
compliant function to display. So when he asked for a named vector, the
chosen name was based on his request and limited when possible to two
digits.
x <- 1:1000
temp <- quantile(x, .975)
If you examine temp, you will see it is a vector containing (as it happens)
a single numeric item (as it happens a double) with the value of 975. But
the name associated is a character string with a "%" appended as shown
below:
str(temp)
Named num 975
- attr(*, "names")= chr "98%"
If you do not want a name attached to the vector, add an option:
quantile(x, .975, names=FALSE)
If you want the name to be longer or different, you can do that after.
names(temp)
[1] "98%"
So change it yourself:
temp
98%
975
names(temp) <- paste(round(temp, 3), "%", sep="")
temp
975.025%
975
The above is for illustration with tabs inserted to show what is in the
output. You probably do not need a name for your purposes and if you ask for
multiple quantiles you might need to adjust the above.
Of course if you wanted another non-default "type" of calculation, what Abby
offered may also apply.
-----Original Message-----
From: R-devel <r-devel-bounces at r-project.org<mailto:r-devel-bounces at r-project.org>> On Behalf Of Abby Spurdle
Sent: Monday, December 14, 2020 4:48 PM
To: Merkle, Edgar C. <merklee at missouri.edu<mailto:merklee at missouri.edu>>
Cc: r-devel at r-project.org<mailto:r-devel at r-project.org>
Subject: Re: [Rd] quantile() names
The "value" is *not* 975.
It's 975.025.
The results that you're observing, are merely the byproduct of formatting.
Maybe, you should try:
quantile (x, .975, type=4)
Which perhaps, using default options, produces the result you're expecting?
On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. <merklee at missouri.edu<mailto:merklee at missouri.edu>>
wrote:
All,
Consider the code below
options(digits=2)
x <- 1:1000
quantile(x, .975)
The value returned is 975 (the 97.5th percentile), but the name has been
shortened to "98%" due to the digits option. Is this intended? I would have
expected the name to also be "97.5%" here. Alternatively, the returned value
might be 980 in order to match the name of "98%".
Best,
Ed
______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Scanned by McAfee and confirmed virus-free.
Find out more here: https://bit.ly/2zCJMrO
Gabriel Becker
on Mon, 14 Dec 2020 13:23:00 -0800 writes:
> Hi Edgar, I certainly don't think quantile(x, .975) should
> return 980, as that is a completely wrong answer.
> I do agree that it seems like the name is a bit
> offputting. I'm not sure how deep in the machinery you'd
> have to go to get digits to no effect on the names (I
> don't have time to dig in right this second).
> On the other hand, though, if we're going to make the
> names not respect digits entirely, what do we do when
> someone does quantile(x, 1/3)? That'd be a bad time had by
> all without digits coming to the rescue, i think.
> Best, ~G
and now we read more replies on this topic without anyone looking at
the pure R source code which is pretty simple and easy.
Instead, people do experiments and take time to muse about their findings..
Honestly, I'm disappointed: I've always thought that if you
*write* on R-devel, you should be able to figure out a few
things yourself before that..
It's not rocket science to see/know that you need to quickly look at
the quantile.default() method function and then to note
that it's format_perc(.) which is used to create the names.
Almost surely, I've been a bit envolved in creating parts of
this and probably am responsible for the current default
behavior.
....
....(sounds of digging) ...
....
....
....
....
....
....
--> Yes:
------------------------------------------------------------------------
r837 | maechler | 1998-03-05 12:20:37 +0100 (Thu, 05. Mar 1998) | 2 Zeilen
Ge?nderte Pfade:
M /trunk/src/library/base/R/quantile
M /trunk/src/library/base/man/quantile.Rd
fixed names(.) construction
------------------------------------------------------------------------
With this diff (my 'svn-diffB -c837 quantile') :
Index: quantile
===================================================================
21c21,23
< names(qs) <- paste(round(100 * probs), "%", sep = "")
---
names(qs) <- paste(formatC(100 * probs, format= "fg", wid=1, dig= max(2,.Options$digits)), "%", sep = "")
-----------------------------------------------------------------
so this was before this was modularized into the format_perc()
utility and quite a while before R 1.0.0 ....
Now, 22.8 years later, I do think that indeed it was not
necessarily the best idea to make the names() construction depend on the
'digits' option entirely and just protect it by using at least 2 digits.
What I think is better is to
1) provide an optional argument 'digits = 7'
back compatible w/ default getOption("digits")
2) when used, check that it is at least '1'
But then some scripts / examples of some people *will* change
..., e.g., because they preferred to have a global setting of digits=5
so I'm guessing it may make more people unhappy than other
people happy if we change this now, after close to 23 years .. ??
Martin
--
Martin Maechler
ETH Zurich and R Core team
> On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar
> C. <merklee at missouri.edu> wrote:
>> All,
>>
>> Consider the code below
>>
>> options(digits=2)
>> x <- 1:1000
>> quantile(x, .975)
>> The value returned is 975 (the 97.5th percentile), but
>> the name has been shortened to "98%" due to the digits
>> option. Is this intended? I would have expected the name
>> to also be "97.5%" here. Alternatively, the returned
>> value might be 980 in order to match the name of "98%".
>>
>> Best, Ed
>>
CITED TEXT CONTAINS EXCERPTS ONLY
and now we read more replies on this topic without anyone looking at the pure R source code which is pretty simple and easy. Instead, people do experiments and take time to muse about their findings.. Honestly, I'm disappointed: I've always thought that if you *write* on R-devel, you should be able to figure out a few things yourself before that..
That's a bit unfair.
Some of us have written packages, containing functions for computing
quantile names:
probhat::ntile.names (,100)
1) provide an optional argument 'digits = 7'
back compatible w/ default getOption("digits")
I'm not sure I've got this right. Are you suggesting that by default, names should have 7 digits?
so I'm guessing it may make more people unhappy than other people happy if we change this now, after close to 23 years .. ??
I would probably be in the less enthusiastic group. I take the view that quantile naming is mainly a convenience, for summary-style output. And on that basis, I would say the current behaviour is about right. Anyone looking for high precision, should probably compute their own quantile names. Also, expanding on an earlier point. The value was 975.025, so a label of "97.5%" could still cause problems. Increasing the precision doesn't necessarily fix this sort of problem. But rather, increases the complexity of the output, beyond what "97.5%" of users would ever want... B.
Sorry, I need to change my last post. I looked at this a bit more, and realized that increasing the (max) number of (name) digits is only relevant in some cases. For people computing quartiles and deciles, this shouldn't make any difference. Therefore, should still be convenient for the purposes of summary-style output.
On Thu, Dec 17, 2020 at 11:48 AM Abby Spurdle <spurdle.a at gmail.com> wrote:
CITED TEXT CONTAINS EXCERPTS ONLY
and now we read more replies on this topic without anyone looking at the pure R source code which is pretty simple and easy. Instead, people do experiments and take time to muse about their findings.. Honestly, I'm disappointed: I've always thought that if you *write* on R-devel, you should be able to figure out a few things yourself before that..
That's a bit unfair.
Some of us have written packages, containing functions for computing
quantile names:
probhat::ntile.names (,100)
1) provide an optional argument 'digits = 7'
back compatible w/ default getOption("digits")
I'm not sure I've got this right. Are you suggesting that by default, names should have 7 digits?
so I'm guessing it may make more people unhappy than other people happy if we change this now, after close to 23 years .. ??
I would probably be in the less enthusiastic group. I take the view that quantile naming is mainly a convenience, for summary-style output. And on that basis, I would say the current behaviour is about right. Anyone looking for high precision, should probably compute their own quantile names. Also, expanding on an earlier point. The value was 975.025, so a label of "97.5%" could still cause problems. Increasing the precision doesn't necessarily fix this sort of problem. But rather, increases the complexity of the output, beyond what "97.5%" of users would ever want... B.