digits in summary.default - R-devel

Thu, Sep 14, 2006 2:14 AM #

Dear all,

the number of significant digits in summary default is

digits = max(3, getOption("digits") -  3)

on my platform this results to be 4. The point is that if you have,
say, integer data of magnitude greater than 10^3 the command summary
will produce heavily rounded results.
A simple example follow:

[1] 123456 234567 345678

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 123500  179000  234600  234600  290100  345700

# quite different from

0%      25%      50%      75%     100%
123456.0 179011.5 234567.0 290122.5 345678.0

Is it  possible to adapt the number of significant digits to the
magnitude of the data?
The first thing that comes into my mind is
digits = nchar(trunc(max(x)))   #

If it is not possible then I think it would be nice to mention the
issue in the documentation.

Thanks for the attention,

Simone

_
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          3.1
year           2006
month          06
day            01
svn rev        38247
language       R
version.string Version 2.3.1 (2006-06-01)

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098262  Fax: +39 051 232153

Martin Maechler

Fri, Sep 15, 2006 12:52 AM #

Simone> Dear all, the number of significant digits in
    Simone> summary default is

    Simone> digits = max(3, getOption("digits") - 3)

    Simone> on my platform this results to be 4. The point is
    Simone> that if you have, say, integer data of magnitude
    Simone> greater than 10^3 the command summary will produce
    Simone> heavily rounded results.

    Simone>   A simple example follow:

    >> x <- c(123456,234567,345678)

    >> x
    Simone> [1] 123456 234567 345678

    >> summary(x)
    Simone>    Min. 1st Qu.  Median Mean 3rd Qu.  Max.  123500
    Simone> 179000 234600 234600 290100 345700

    Simone> # quite different from

    >> quantile(x)
    Simone>       0%        25%     50%      75%     100%
    Simone>  123456.0 179011.5 234567.0 290122.5 345678.0

            
Yes, a very very very old topic, and has been frequently on the
R lists.
The reason for this default has been compatibility with S
and in particular Splus-3.4 (1996) which used to be a partial
role model for R in its infancy.

However, I now see that Insightful also must have decided that
the old S setting was not satisfactory and that one can and
should do better.

    Simone> Is it possible to adapt the number of significant
    Simone> digits to the magnitude of the data?  The first
    Simone> thing that comes into my mind is 

    Simone>      digits = nchar(trunc(max(x))) #

that's a first step of one thing to consider, yes,
but does need quite a bit of fixup before it's usable.

Since I've now seen the code of summary.default in S-plus 6.2,
I'm not in a good position to propose a code change here ---
unless Insightful ``donates'' their 3 lines of implementation to
R  {which I think would be quite fair given the recent flurry of
    things they've recently ported into S-plus 8.x}

    Simone> If it is not possible then I think it would be nice
    Simone> to mention the issue in the documentation.

The issue is mentioned but maybe in a too terse way.

I agree that I'd also want to change this behavior.
It's definitely too late for R 2.4.0, since although this may
seem like a small thing to do,
it can have quite a large effect in many outputs of R scripts.

    Simone> Thanks for the attention,

    Simone> Simone

    >> R.version
    ..............
    (does not really matter - here for once)

Martin Maechler, ETH Zurich

Karl Ove Hufthammer

Fri, Sep 15, 2006 2:21 AM #

Martin Maechler skreiv:

It's also possible to be a bit smarter in specific cases. See for example
the LaTeX table functions for regression summaries in the Dmisc package[1],
which uses the magnitude of the standard errors to dermine the number of
digits shown for estimates (s.t. the number of digits vary for each row/
estimate).

[1] Not on CRAN. See http://www.menne-biomed.de/download/download.html

Karl Ove Hufthammer
E-mail and Jabber: karl at huftis.org