I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero! Is this a "feature" of R or did I miss something? --------------------------------------------- Henrik Andersson Netherlands Institute of Ecology - Centre for Estuarine and Marine Ecology P.O. Box 140 4400 AC Yerseke Phone: +31 113 577473 h.andersson at nioo.knaw.nl http://www.nioo.knaw.nl/ppages/handersson
Formatting numbers with a limited amount of digits consistently
9 messages · Henrik Andersson, Duncan Murdoch, Gabor Grothendieck +1 more
Henrik Andersson wrote:
I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero!
signif() changes the value; you don't want that, you want to affect how a number is displayed. Use format() or formatC() instead, for example > x <- c(17.01, 18.15) > format(x, digits=3) [1] "17.0" "18.1" > noquote(format(x, digits=3)) [1] 17.0 18.1
Is this a "feature" of R or did I miss something?
I'd say both. Duncan Murdoch
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Henrik Andersson wrote:
I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero!
signif() changes the value; you don't want that, you want to affect how a number is displayed. Use format() or formatC() instead, for example
> x <- c(17.01, 18.15) > format(x, digits=3)
[1] "17.0" "18.1"
> noquote(format(x, digits=3))
[1] 17.0 18.1
That works in the above context but I don't think it works generally:
R> f <- head(faithful)
R> f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
R> format(f, digits = 3)
eruptions waiting
1 3.60 79
2 1.80 54
3 3.33 74
4 2.28 62
5 4.53 85
6 2.88 55
R> # this works in this case
R> noquote(prettyNum(round(f,1), nsmall = 1))
eruptions waiting
[1,] 3.6 79.0
[2,] 1.8 54.0
[3,] 3.3 74.0
[4,] 2.3 62.0
[5,] 4.5 85.0
[6,] 2.9 55.0
and even that does not work in the desired way (which presumably
is not to use exponent format) if you have some
large enough numbers like 1e6 which it will display using
the e notation rather than using ordinary notation.
R> f[1,1] <- 1e6 + 0.11
R> noquote(prettyNum(round(f,1), nsmall = 1))
eruptions waiting
[1,] 1.0e+06 79.0
[2,] 1.8e+00 54.0
[3,] 3.3e+00 74.0
[4,] 2.3e+00 62.0
[5,] 4.5e+00 85.0
[6,] 2.9e+00 55.0
I have struggled with this myself and have generally been able
to come up with something for specific instances but I have generally
found it a pain to do a simple thing like format a table exactly as I want
without undue effort. Maybe someone else has figured this out.
Gabor Grothendieck wrote:
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Henrik Andersson wrote:
I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero!
signif() changes the value; you don't want that, you want to affect how a number is displayed. Use format() or formatC() instead, for example
x <- c(17.01, 18.15) format(x, digits=3)
[1] "17.0" "18.1"
noquote(format(x, digits=3))
[1] 17.0 18.1
That works in the above context but I don't think it works generally:
R> f <- head(faithful)
R> f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
R> format(f, digits = 3)
eruptions waiting
1 3.60 79
2 1.80 54
3 3.33 74
4 2.28 62
5 4.53 85
6 2.88 55
R> # this works in this case
R> noquote(prettyNum(round(f,1), nsmall = 1))
eruptions waiting
[1,] 3.6 79.0
[2,] 1.8 54.0
[3,] 3.3 74.0
[4,] 2.3 62.0
[5,] 4.5 85.0
[6,] 2.9 55.0
and even that does not work in the desired way (which presumably
is not to use exponent format) if you have some
large enough numbers like 1e6 which it will display using
the e notation rather than using ordinary notation.
formatC with format="f" seems to work for me, though it assumes you're
specifying decimal places rather than significant digits. It also wants
a vector of numbers as input, not a dataframe. So the following gives
pretty flexible control over what a table will look like:
> data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
+ waiting = formatC(f$waiting, digits=1, format='f'))
eruptions waiting
1 1000000.11 79.0
2 1.80 54.0
3 3.33 74.0
4 2.28 62.0
5 4.53 85.0
6 2.88 55.0
I have struggled with this myself and have generally been able to come up with something for specific instances but I have generally found it a pain to do a simple thing like format a table exactly as I want without undue effort. Maybe someone else has figured this out.
I think that formatting tables properly requires some thought, and R is no good at thinking. You can easily recognize a badly formatted table, but it's very hard to write down rules that work in general circumstances. It's also a matter of taste, so if I managed to write a function that matched my taste, you would find you wanted to make changes. It's sort of like expecting plot(x, y) to always come up with the best possible plot of y versus x. It's just not a reasonable expectation. It's better to provide tools (like abline() for plots or formatC() for tables) that allow you to tailor a plot or table to your particular needs. Duncan Murdoch
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Gabor Grothendieck wrote:
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Henrik Andersson wrote:
I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero!
signif() changes the value; you don't want that, you want to affect how a number is displayed. Use format() or formatC() instead, for example
x <- c(17.01, 18.15) format(x, digits=3)
[1] "17.0" "18.1"
noquote(format(x, digits=3))
[1] 17.0 18.1
That works in the above context but I don't think it works generally:
R> f <- head(faithful)
R> f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
R> format(f, digits = 3)
eruptions waiting
1 3.60 79
2 1.80 54
3 3.33 74
4 2.28 62
5 4.53 85
6 2.88 55
R> # this works in this case
R> noquote(prettyNum(round(f,1), nsmall = 1))
eruptions waiting
[1,] 3.6 79.0
[2,] 1.8 54.0
[3,] 3.3 74.0
[4,] 2.3 62.0
[5,] 4.5 85.0
[6,] 2.9 55.0
and even that does not work in the desired way (which presumably
is not to use exponent format) if you have some
large enough numbers like 1e6 which it will display using
the e notation rather than using ordinary notation.
formatC with format="f" seems to work for me, though it assumes you're specifying decimal places rather than significant digits. It also wants a vector of numbers as input, not a dataframe. So the following gives pretty flexible control over what a table will look like:
> data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
+ waiting = formatC(f$waiting, digits=1, format='f')) eruptions waiting 1 1000000.11 79.0 2 1.80 54.0 3 3.33 74.0 4 2.28 62.0 5 4.53 85.0 6 2.88 55.0
I have struggled with this myself and have generally been able to come up with something for specific instances but I have generally found it a pain to do a simple thing like format a table exactly as I want without undue effort. Maybe someone else has figured this out.
I think that formatting tables properly requires some thought, and R is no good at thinking. You can easily recognize a badly formatted table, but it's very hard to write down rules that work in general circumstances. It's also a matter of taste, so if I managed to write a function that matched my taste, you would find you wanted to make changes. It's sort of like expecting plot(x, y) to always come up with the best possible plot of y versus x. It's just not a reasonable expectation. It's better to provide tools (like abline() for plots or formatC() for tables) that allow you to tailor a plot or table to your particular needs.
Thanks. That seems to be the idiom I was missing. One thing that would be nice would be if formatC could handle data frames.
On Mon, 2005-05-30 at 23:53 -0400, Gabor Grothendieck wrote:
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Gabor Grothendieck wrote:
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Henrik Andersson wrote:
I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero!
signif() changes the value; you don't want that, you want to affect how a number is displayed. Use format() or formatC() instead, for example
x <- c(17.01, 18.15) format(x, digits=3)
[1] "17.0" "18.1"
noquote(format(x, digits=3))
[1] 17.0 18.1
That works in the above context but I don't think it works generally:
R> f <- head(faithful)
R> f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
R> format(f, digits = 3)
eruptions waiting
1 3.60 79
2 1.80 54
3 3.33 74
4 2.28 62
5 4.53 85
6 2.88 55
R> # this works in this case
R> noquote(prettyNum(round(f,1), nsmall = 1))
eruptions waiting
[1,] 3.6 79.0
[2,] 1.8 54.0
[3,] 3.3 74.0
[4,] 2.3 62.0
[5,] 4.5 85.0
[6,] 2.9 55.0
and even that does not work in the desired way (which presumably
is not to use exponent format) if you have some
large enough numbers like 1e6 which it will display using
the e notation rather than using ordinary notation.
formatC with format="f" seems to work for me, though it assumes you're specifying decimal places rather than significant digits. It also wants a vector of numbers as input, not a dataframe. So the following gives pretty flexible control over what a table will look like:
> data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
+ waiting = formatC(f$waiting, digits=1, format='f')) eruptions waiting 1 1000000.11 79.0 2 1.80 54.0 3 3.33 74.0 4 2.28 62.0 5 4.53 85.0 6 2.88 55.0
I have struggled with this myself and have generally been able to come up with something for specific instances but I have generally found it a pain to do a simple thing like format a table exactly as I want without undue effort. Maybe someone else has figured this out.
I think that formatting tables properly requires some thought, and R is no good at thinking. You can easily recognize a badly formatted table, but it's very hard to write down rules that work in general circumstances. It's also a matter of taste, so if I managed to write a function that matched my taste, you would find you wanted to make changes. It's sort of like expecting plot(x, y) to always come up with the best possible plot of y versus x. It's just not a reasonable expectation. It's better to provide tools (like abline() for plots or formatC() for tables) that allow you to tailor a plot or table to your particular needs.
Thanks. That seems to be the idiom I was missing. One thing that would be nice would be if formatC could handle data frames.
Guys, perhaps I am missing something here, but there seems to be some
confusion as to how the numbers are stored internally, versus how the
output is displayed and the meaning of "significant digits", which is
what I believe Henrik's original query was about.
By default, R's printed output uses the settings from options("digits")
and options("scipen") to define output based upon the number of
significant digits, which is of course not the same as the number of
decimal places. Hence the variance in the output that Henrik gets and
why the trailing zero is dropped.
The use of signif() does not help here because it is still based upon
the number of significant digits, where the trailing zero still gets
dropped.
The use of the above are "inexact" when it comes to creating formatted
output for a table with a consistent number of decimal places to align
columns of numbers.
format() is still problematic here because it too uses the number of
significant digits, defaulting to options("digits").
Using formatC() or sprintf() in conjunction with cat() is usually the
best way to gain control over how numeric output is formatted,
especially in a nicely aligned table. This is what I use in CrossTable
(), where I want decimal aligned columns for numbers in the tabular
output, along with fixed width columns for textual output (ie. labels,
etc.).
Briefly, along the lines of Gabor's example on the output using the
faithful dataset above, one could use something like:
f <- head(faithful)
noquote(apply(f, 2, function(x) formatC(x, format = "f", digits = 1)))
eruptions waiting 1 3.6 79.0 2 1.8 54.0 3 3.3 74.0 4 2.3 62.0 5 4.5 85.0 6 2.9 55.0 which only affects how the data is printed, not the data itself. It can work fine for a 2D object that has all numeric columns. Note however that the numeric columns are left-aligned, not right- aligned, as in the default print method, since the output of the above function is a character matrix, rather than a data.frame with numeric columns. Hence, note:
f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
Thus, for greater control, one should use sprintf() and cat():
out.lines <- sprintf("%15s %15s\n", colnames(f)[1], colnames(f)[2])
for (i in 1:nrow(f))
{
out.lines <- c(out.lines,
sprintf("%14.1f %14.1f\n", f[i, 1], f[i, 2]))
}
cat(out.lines)
eruptions waiting
3.6 79.0
1.8 54.0
3.3 74.0
2.3 62.0
4.5 85.0
2.9 55.0
In the above case, one can specify the column widths for the column
labels and the row values. Of course, the above could be extended to
become a generic function for data frames with multiple data types, with
arguments enabling the specification of column widths, number of decimal
places, etc. One might even want more than one specification for the
number of decimal places depending upon the nature of the columns on the
object to be printed, so vectors could be used for these arguments.
I'll leave that for further exercise.
Final note to Henrik: Note that the IEEE 754 rounding standard as
implemented in R results in:
round(18.15, 1)
[1] 18.1
formatC(18.15, format = "f", digits = 1)
[1] "18.1"
sprintf("%5.1f", 18.15)
[1] " 18.1" This is because the rounding method implemented is the "go to the even digit" approach. Thus, you don't get 18.2. See ?round for more information. HTH, Marc Schwartz
Marc Schwartz wrote:
Final note to Henrik: Note that the IEEE 754 rounding standard as implemented in R results in:
round(18.15, 1)
[1] 18.1
formatC(18.15, format = "f", digits = 1)
[1] "18.1"
sprintf("%5.1f", 18.15)
[1] " 18.1" This is because the rounding method implemented is the "go to the even digit" approach. Thus, you don't get 18.2. See ?round for more information.
I don't think "go to the even digit" is being applied here: ".1" is not an even digit. I suspect what's going on in this example is that 18.15 is not being represented exactly; it's stored internally as something slightly less than that value, so it rounds down. You'd see the "go to the even digit" rule applied when rounding 17.5 or 18.5, which can be represented exactly, being fractions with a power of 2 in the denominator: > round(18.5, 0) [1] 18 > round(17.5, 0) [1] 18 (This is very gratifying. Usually when I try to predict the exact behaviour of round() or signif() I end up having to rewrite my prediction afterwards. But this time I got it right. Honest!) Duncan Murdoch
On 5/31/05, Marc Schwartz <MSchwartz at mn.rr.com> wrote:
On Mon, 2005-05-30 at 23:53 -0400, Gabor Grothendieck wrote:
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Gabor Grothendieck wrote:
On 5/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Henrik Andersson wrote:
I have tried to get signif, round and format to display numbers like these consistently in a table, using e.g. signif(x,digits=3) 17.01 18.15 I want 17.0 18.2 Not 17 18.2 Why is the last digit stripped off in the case when it is zero!
signif() changes the value; you don't want that, you want to affect how a number is displayed. Use format() or formatC() instead, for example
x <- c(17.01, 18.15) format(x, digits=3)
[1] "17.0" "18.1"
noquote(format(x, digits=3))
[1] 17.0 18.1
That works in the above context but I don't think it works generally:
R> f <- head(faithful)
R> f
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
R> format(f, digits = 3)
eruptions waiting
1 3.60 79
2 1.80 54
3 3.33 74
4 2.28 62
5 4.53 85
6 2.88 55
R> # this works in this case
R> noquote(prettyNum(round(f,1), nsmall = 1))
eruptions waiting
[1,] 3.6 79.0
[2,] 1.8 54.0
[3,] 3.3 74.0
[4,] 2.3 62.0
[5,] 4.5 85.0
[6,] 2.9 55.0
and even that does not work in the desired way (which presumably
is not to use exponent format) if you have some
large enough numbers like 1e6 which it will display using
the e notation rather than using ordinary notation.
formatC with format="f" seems to work for me, though it assumes you're specifying decimal places rather than significant digits. It also wants a vector of numbers as input, not a dataframe. So the following gives pretty flexible control over what a table will look like:
> data.frame(eruptions = formatC(f$eruptions, digits=2, format='f'),
+ waiting = formatC(f$waiting, digits=1, format='f')) eruptions waiting 1 1000000.11 79.0 2 1.80 54.0 3 3.33 74.0 4 2.28 62.0 5 4.53 85.0 6 2.88 55.0
I have struggled with this myself and have generally been able to come up with something for specific instances but I have generally found it a pain to do a simple thing like format a table exactly as I want without undue effort. Maybe someone else has figured this out.
I think that formatting tables properly requires some thought, and R is no good at thinking. You can easily recognize a badly formatted table, but it's very hard to write down rules that work in general circumstances. It's also a matter of taste, so if I managed to write a function that matched my taste, you would find you wanted to make changes. It's sort of like expecting plot(x, y) to always come up with the best possible plot of y versus x. It's just not a reasonable expectation. It's better to provide tools (like abline() for plots or formatC() for tables) that allow you to tailor a plot or table to your particular needs.
Thanks. That seems to be the idiom I was missing. One thing that would be nice would be if formatC could handle data frames.
Guys, perhaps I am missing something here, but there seems to be some
confusion as to how the numbers are stored internally, versus how the
output is displayed and the meaning of "significant digits", which is
what I believe Henrik's original query was about.
By default, R's printed output uses the settings from options("digits")
and options("scipen") to define output based upon the number of
significant digits, which is of course not the same as the number of
decimal places. Hence the variance in the output that Henrik gets and
why the trailing zero is dropped.
The use of signif() does not help here because it is still based upon
the number of significant digits, where the trailing zero still gets
dropped.
The use of the above are "inexact" when it comes to creating formatted
output for a table with a consistent number of decimal places to align
columns of numbers.
format() is still problematic here because it too uses the number of
significant digits, defaulting to options("digits").
Good point. It would be nice if format had an argument that allowed one to specify the number of digits after the decimal place. I think this would reduce frustrations in quickly formatting data frames.
On Tue, 2005-05-31 at 11:11 -0400, Duncan Murdoch wrote:
Marc Schwartz wrote:
Final note to Henrik: Note that the IEEE 754 rounding standard as implemented in R results in:
round(18.15, 1)
[1] 18.1
formatC(18.15, format = "f", digits = 1)
[1] "18.1"
sprintf("%5.1f", 18.15)
[1] " 18.1" This is because the rounding method implemented is the "go to the even digit" approach. Thus, you don't get 18.2. See ?round for more information.
I don't think "go to the even digit" is being applied here: ".1" is not an even digit. I suspect what's going on in this example is that 18.15 is not being represented exactly; it's stored internally as something slightly less than that value, so it rounds down. You'd see the "go to the even digit" rule applied when rounding 17.5 or 18.5, which can be represented exactly, being fractions with a power of 2 in the denominator:
> round(18.5, 0)
[1] 18
> round(17.5, 0)
[1] 18 (This is very gratifying. Usually when I try to predict the exact behaviour of round() or signif() I end up having to rewrite my prediction afterwards. But this time I got it right. Honest!) Duncan Murdoch
Duncan, Just got back from a day long meeting. You are indeed correct on the rounding here. If you look at how 18.15 appears when printed with more significant digits:
print(18.15, 20)
[1] 18.149999999999998579 That's what I get for trying to deal with floating point representation issues first thing after a three day weekend... ;-) Thanks for the correction. Marc