hist(..., log="y")
Log histograms are of particular interest when dealing with heavy tailed data/distributions. It is not just a matter of using a log scale on the y axis though because the base line of the histogram is at zero and the log of zero is minus infinity. I have implemented a version of a log histogram in the function logHist, in my package DistributionUtils, which may be of interest if anyone seriously wishes to add functionality to the base hist function. David Scott
On 7/08/2023 8:54 pm, Martin Maechler wrote:
Ott Toomet on Sat, 5 Aug 2023 23:49:38 -0700 writes:
Sorry if this topic has been discussed earlier.
Currently, hist(..., log="y") fails with
hist(rexp(1000, 1), log="y")
Warning messages: 1: In plot.window(xlim, ylim, "", ...) : nonfinite axis=2 limits [GScale(-inf,2.59218,..); log=TRUE] -- corrected now 2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) : "log" is not a graphical parameter 3: In axis(1, ...) : "log" is not a graphical parameter 4: In axis(2, at = yt, ...) : "log" is not a graphical parameter
The same applies for log="x"
[...........]
This applies for the current svn version of R, and also a few recent published versions. This is unfortunate for two reasons:
* the error message is not quite correct--"log" is a graphical parameter, but "hist" does not support it.
No, not if you use R's (or S's before that) definition:
graphical parameters := {the possible argument of par()}
log is *not* among these.
* for various kinds of data it is worthwhile to make histograms in log scale. "hist" is a very nice and convenient function and support for log scale would be handy here.
Yes, possibly (see below). Note that the above are not errors, but warnings, and there *is* some support, e.g.,
set.seed(1); range(x <- rlnorm(1111))
[1] 0.04938796 45.16293285
hx <- hist(x, log="x", xlim=c(0.049, 47))
Warning messages: 1: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) : "log" is not a graphical parameter 2: In axis(1, ...) : "log" is not a graphical parameter 3: In axis(2, at = yt, ...) : "log" is not a graphical parameter
str(hx)
List of 6 $ breaks : num [1:11] 0 5 10 15 20 25 30 35 40 45 ... $ counts : int [1:10] 1041 58 10 0 1 0 0 0 0 1 $ density : num [1:10] 0.1874 0.01044 0.0018 0 0.00018 ... $ mids : num [1:10] 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 $ xname : chr "x" $ equidist: logi TRUE - attr(*, "class")= chr "histogram" where we see that it *does* plot ... but crucially not the very first bin, because log(0) == -Inf, with over 90% (viz. 1041) counts.
I also played a little with the code, and it seems to be very easy to implement. I am happy to make a patch if the team thinks it is worth pursuing.
Cheers, Ott
Yeah.. and that's is the important question.
Most statisticians know that a histogram is a pretty bad
density estimator (notably if the natural density has an
infinite support) compared to simple kernel density estimates,
e.g. those by density().
Hence, I'd argue that if you expect enough sophistication from
your "viewer"s to understand a log-scale histogram, I'd say you
should use a density with log="x" and or "y" and I I have
successfully done so several times: It *does* work
{particularly nicely if you use my sfsmisc::eaxis() for the log axis/es}.
But you (and others) may have more good arguments why hist()
should work with log="x" and/or log="y"...
Also if your patch relatively small, its usefulness may
outweigh the added complexity (and its long-term maintenance !).
Martin
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
_________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Email:d.scott at auckland.ac.nz [[alternative HTML version deleted]]