Skip to content

Summary shows wrong maximum

10 messages · Uwe Ligges, Gavin Simpson, Sebastian Spaeth +4 more

#
Hi all,
I have a list with a numerical column "cum_hardreuses". By coincidence I  
discovered this:
[1] 1793
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
       1       2       4      36      14    1790

(note the max value of 1790) Ouch this is bad! Anything I can do to remedy  
this? Known bug?

This is a Version 1.16 (3198) of the MacOSX R.

Regards,
Sebastian Spaeth
#
Sebastian Spaeth wrote:
No, it's a feature! See ?summary: printing is done up to 3 significant 
digits by default. If you want it more precise, for example use:

summary(libs[,"cum_hardreuses"], digits=10)

Uwe Ligges
#
On Mon, 2006-12-04 at 12:04 +0100, Sebastian Spaeth wrote:
Did you read ?summary, which has:

 ## Default S3 method:
     summary(object, ..., digits = max(3, getOption("digits")-3))

so this is a rounding issue of the *printed* representation of the
summary. Just change digits to be a larger number:
[1] 2.434443
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-2.21100 -0.65450  0.03793  0.06919  0.84650  2.43400
Min.     1st Qu.      Median        Mean     3rd Qu.        Max.
-2.21106232 -0.65451716  0.03793040  0.06919486  0.84652269  2.43444263
[1] 2434442
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-2211000  -654500    37930    69190   846500  2434000
Min.     1st Qu.      Median        Mean     3rd Qu.        Max.
-2211063.00  -654517.50    37930.00    69194.38   846522.00  2434442.00

HTH

G
#
Uwe Ligges wrote:
Thanks for the info. Good to know. I didn't think that it would round 
pre-comma digits though.

Grateful,
Sebastian
1 day later
#
On Mon, 4 Dec 2006, Uwe Ligges wrote:
Unfortunately, '1790' is printed with *four* significant digits, not 
three. The correct representation with three significant digits would have 
to employ scientific notation, 1.79e3.
#
'Unfortunately' you give no credentials for your ex cathedra 
pronouncement.  E.g.

http://en.wikipedia.org/wiki/Significant_digits

says

The situation regarding trailing zero digits that fall to the left of the 
decimal place in a number with no digits provided that fall to the right 
of the decimal place is less clear, but these are typically not considered 
significant unless the decimal point is placed at the end of the number to 
indicate otherwise (e.g., "2000." versus "2000"). To make things more 
clear, trailing zeros are only recognized as significant figures if the 
number they are a part of has a decimal point. For example, 450 only has 
two sig figs, but 450. has three.

which directly contradicts you.  So this is at best a matter of opinion, 
and credentials do matter for opinions.
On Tue, 5 Dec 2006, Oliver Czoske wrote:

            

  
    
#
Folks:

Is 

"So this is at best a matter of opinion, 
and credentials do matter for opinions."

-- Brian Ripley

an R fortunes candidate?

-- Bert Gunter
On Tue, 5 Dec 2006, Oliver Czoske wrote:

            
remedy

  
    
#
I don't know about candidacy, and I'm not going to argue about
"correctness," but it seems to me that the only valid reasons to
limit precision of printing in a statistics program are (1) to
save space and (2) to allow for machine limitations. This is
neither. To chop off information and replace it with zeroes is
just plain nasty.
Bert Gunter <gunter.berton at gene.com> wrote:

            

  
    
#
Mike:

I offered no opinion -- and really didn't have any -- about the worthiness
of any of the comments that were made. I just liked Brian's little quotable
aside.

But since you bait me a bit ...

In general, I believe that showing th 2-3 most "important" -- **not
significant** -- digits **and no more** is desirable. By " most important" I
mean the leftmost digits which are changing in the data (there are some
caveats in the presence of extreme outliers). Printing more digits merely
obfuscates the ability of the eye/brain to perceive the patterns of change
in the data, the presumed intent of displaying it (not of storing it, of
course). Displaying excessive digits to demonstrate (usually falsely) one's
precision is evil. Clarity of communications is the standard we should
aspire to.

These views have been more eloquently expressed by  A.S.C Ehrenburg and
Howard Wainer among others...

-- Bert


Bert Gunter
Nonclinical Statistics
7-7374

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mike Prager
Sent: Wednesday, December 06, 2006 11:46 AM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] Summary shows wrong maximum

I don't know about candidacy, and I'm not going to argue about
"correctness," but it seems to me that the only valid reasons to
limit precision of printing in a statistics program are (1) to
save space and (2) to allow for machine limitations. This is
neither. To chop off information and replace it with zeroes is
just plain nasty.
Bert Gunter <gunter.berton at gene.com> wrote:

            
I
have

  
    
#
Bert--

Well, in an attempt to be pithy, I think I lost my message.

The comment was directed not at you specifically, but at the
idea that, given four print positions, one would ever want to
print zeroes instead of data without an explicit warning.

I quite agree with your comments on precision.  However, if more
than those two or three digits are *printed*, I think they
should be as accurate as possible, or accompanied in each place
by a written disclaimer.

Let's say that the mean of the data is not zero, but that the
precision is well within the range of floating point.  Then,
information is being thrown away for no clear reason.  What
makes it "nasty" in my opinion is that the information *appears*
to be there.  (Maybe this is a problem in semiotics.)  So while
I don't think "1.01e3" is more correct than "1010", it does not
appear to be conveying information that has been stripped from
the result.

Is the following really how we want R to work?
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19000   19000   19000   19000   19000   19010 

Respectfully,
--Mike
Bert Gunter <gunter.berton at gene.com> wrote: