Summary shows wrong maximum
Bert-- Well, in an attempt to be pithy, I think I lost my message. The comment was directed not at you specifically, but at the idea that, given four print positions, one would ever want to print zeroes instead of data without an explicit warning. I quite agree with your comments on precision. However, if more than those two or three digits are *printed*, I think they should be as accurate as possible, or accompanied in each place by a written disclaimer. Let's say that the mean of the data is not zero, but that the precision is well within the range of floating point. Then, information is being thrown away for no clear reason. What makes it "nasty" in my opinion is that the information *appears* to be there. (Maybe this is a problem in semiotics.) So while I don't think "1.01e3" is more correct than "1010", it does not appear to be conveying information that has been stripped from the result. Is the following really how we want R to work?
a <- c(19001., 19002., 19003., 19006.) summary(a)
Min. 1st Qu. Median Mean 3rd Qu. Max. 19000 19000 19000 19000 19000 19010 Respectfully, --Mike
Bert Gunter <gunter.berton at gene.com> wrote:
Mike: I offered no opinion -- and really didn't have any -- about the worthiness of any of the comments that were made. I just liked Brian's little quotable aside. But since you bait me a bit ... In general, I believe that showing th 2-3 most "important" -- **not significant** -- digits **and no more** is desirable. By " most important" I mean the leftmost digits which are changing in the data (there are some caveats in the presence of extreme outliers). Printing more digits merely obfuscates the ability of the eye/brain to perceive the patterns of change in the data, the presumed intent of displaying it (not of storing it, of course). Displaying excessive digits to demonstrate (usually falsely) one's precision is evil. Clarity of communications is the standard we should aspire to. These views have been more eloquently expressed by A.S.C Ehrenburg and Howard Wainer among others... -- Bert Bert Gunter Nonclinical Statistics 7-7374 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mike Prager Sent: Wednesday, December 06, 2006 11:46 AM To: r-help at stat.math.ethz.ch Subject: Re: [R] Summary shows wrong maximum I don't know about candidacy, and I'm not going to argue about "correctness," but it seems to me that the only valid reasons to limit precision of printing in a statistics program are (1) to save space and (2) to allow for machine limitations. This is neither. To chop off information and replace it with zeroes is just plain nasty. Bert Gunter <gunter.berton at gene.com> wrote:
Folks: Is "So this is at best a matter of opinion, and credentials do matter for opinions." -- Brian Ripley an R fortunes candidate? -- Bert Gunter On Tue, 5 Dec 2006, Oliver Czoske wrote:
On Mon, 4 Dec 2006, Uwe Ligges wrote:
Sebastian Spaeth wrote:
Hi all, I have a list with a numerical column "cum_hardreuses". By coincidence
I
discovered this:
max(libs[,"cum_hardreuses"])
[1] 1793
summary(libs[,"cum_hardreuses"])
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 2 4 36 14 1790
(note the max value of 1790) Ouch this is bad! Anything I can do to
remedy
this? Known bug?
No, it's a feature! See ?summary: printing is done up to 3 significant digits by default.
Unfortunately, '1790' is printed with *four* significant digits, not three. The correct representation with three significant digits would
have
to employ scientific notation, 1.79e3.
Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement.