Using R-2.12.1 and latticeExtra-0.6-14, I would like to understand why a rootogram displaying samples from the Poisson distribution looks like I expected it, whereas a rootogram using the normal distribution does not: library(latticeExtra) rootogram(~rpois(1000, lambda = 50), dfun = function(x) dpois(x, lambda = 50)) rootogram(~rnorm(1000), dfun = function(x) dnorm(x,mean(x),sd(x))) I probably can't attach figures here. Thus a textual description of what I get may suffice: With increasing sample size, the rootogram using random samples from the Poisson distribution shows decreasing differences (bars are quickly approaching the zero line), whereas the displayed differences for random samples of the normal distribution are always large. The differences even increase with sample size, i.e, the hanging bars tend to vanish for very large samples.
rootogram for normal distributions
6 messages · Hugo Mildenberger, Hadley Wickham, Achim Zeileis +1 more
On Sun, 16 Jan 2011, Hugo Mildenberger wrote:
Using R-2.12.1 and latticeExtra-0.6-14, I would like to understand why a rootogram displaying samples from the Poisson distribution looks like I expected it, whereas a rootogram using the normal distribution does not: library(latticeExtra) rootogram(~rpois(1000, lambda = 50), dfun = function(x) dpois(x, lambda = 50)) rootogram(~rnorm(1000), dfun = function(x) dnorm(x,mean(x),sd(x))) I probably can't attach figures here. Thus a textual description of what I get may suffice: With increasing sample size, the rootogram using random samples from the Poisson distribution shows decreasing differences (bars are quickly approaching the zero line), whereas the displayed differences for random samples of the normal distribution are always large. The differences even increase with sample size, i.e, the hanging bars tend to vanish for very large samples.
The normal distribution is a continuous distribution, i.e., the frequency for each observed value will essentially be 1/n and not converge to the density function. Hence, you would need to look at histogram or smoothed densities. Rootograms, on the other hand, are intended for discrete distributions. Z
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The normal distribution is a continuous distribution, i.e., the frequency for each observed value will essentially be 1/n and not converge to the density function. Hence, you would need to look at histogram or smoothed densities. Rootograms, on the other hand, are intended for discrete distributions.
I don't think that's true - rootograms are useful for both continuous and discrete distributions. See (e.g.) p 314 at http://www.edwardtufte.com/tufte/tukey, where Tukey himself uses a rootogram with a normal distribution. Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On Sun, 16 Jan 2011, Hadley Wickham wrote:
The normal distribution is a continuous distribution, i.e., the frequency for each observed value will essentially be 1/n and not converge to the density function. Hence, you would need to look at histogram or smoothed densities. Rootograms, on the other hand, are intended for discrete distributions.
I don't think that's true - rootograms are useful for both continuous and discrete distributions. See (e.g.) p 314 at http://www.edwardtufte.com/tufte/tukey, where Tukey himself uses a rootogram with a normal distribution.
OK, let me rephrase: Rootograms as implemented in rootogram() are intended for discrete distributions. At least that's my reading. But maybe I've missed a trick that you can point us to. Z
Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you very much for your qualified answers, and also for the link to the Tukey paper. I appreciate Tukey's writings very much. Looking at the lattice code (below), a possible implementation might involve binning, not so? I see a problematic part here: xx <- sort(unique(x)) Unique certainly works well with Poisson distributed data, but is essentially a no-op when confronted with continous floating-point numbers. Best Hugo
prepanel.rootogram
function (x, y = table(x), dfun = NULL, transformation = sqrt,
hang = TRUE, ...)
{
plot.line <- trellis.par.get("plot.line")
stopifnot(is.function(dfun))
yy <- transformation(y/sum(y))
xx <- sort(unique(x))
dotArgs <- list(...)
dfunArgs <- names(formals(dfun))
if (!("..." %in% dfunArgs))
dotArgs <- dotArgs[dfunArgs[-1]]
dd <- transformation(do.call(dfun, c(list(xx), dotArgs)))
list(xlim = range(xx), ylim = if (hang) range(dd, dd - yy,
0) else range(dd, yy, 0), dx = diff(xx), dy = diff(dd))
}
panel.rootogram
function (x, y = table(x), dfun = NULL, col = plot.line$col,
lty = plot.line$lty, lwd = plot.line$lwd, alpha = plot.line$alpha,
transformation = sqrt, hang = TRUE, ...)
{
plot.line <- trellis.par.get("plot.line")
ref.line <- trellis.par.get("reference.line")
stopifnot(is.function(dfun))
yy <- transformation(y/sum(y))
xx <- sort(unique(x))
dotArgs <- list(...)
dfunArgs <- names(formals(dfun))
if (!("..." %in% dfunArgs))
dotArgs <- dotArgs[dfunArgs[-1]]
dd <- transformation(do.call(dfun, c(list(xx), dotArgs)))
panel.abline(h = 0, col = ref.line$col, lty = ref.line$lty,
lwd = ref.line$lwd, alpha = ref.line$alpha)
panel.segments(xx, if (hang)
dd
else 0, xx, if (hang)
(dd - yy)
else yy, col = col, lty = lty, lwd = lwd, alpha = alpha,
...)
panel.lines(xx, dd)
}
<environment: namespace:latticeExtra>
On Sunday 16 January 2011 15:59:58 Achim Zeileis wrote:
On Sun, 16 Jan 2011, Hadley Wickham wrote:
The normal distribution is a continuous distribution, i.e., the frequency for each observed value will essentially be 1/n and not converge to the density function. Hence, you would need to look at histogram or smoothed densities. Rootograms, on the other hand, are intended for discrete distributions.
I don't think that's true - rootograms are useful for both continuous and discrete distributions. See (e.g.) p 314 at http://www.edwardtufte.com/tufte/tukey, where Tukey himself uses a rootogram with a normal distribution.
OK, let me rephrase: Rootograms as implemented in rootogram() are intended for discrete distributions. At least that's my reading. But maybe I've missed a trick that you can point us to. Z
Hadley
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Sun, Jan 16, 2011 at 11:58 AM, Hugo Mildenberger
<Hugo.Mildenberger at web.de> wrote:
Thank you very much for your qualified answers, and also for the link to the Tukey paper. I appreciate Tukey's writings very much.
Yes, thanks to Hadley for the nice reference, I hadn't seen it before.
Looking at the lattice code (below), a possible implementation might involve ?binning, not so? I see a problematic part here: ? xx <- sort(unique(x)) Unique certainly works well with Poisson distributed data, but is essentially a no-op when confronted with continous floating-point numbers.
True, but as Achim said, rootogram() is intended to work with data
arising from discrete distributions, not continuous ones. I see now
that this is not as explicit as it could be in the help page (although
"frequency distribution" gives a hint), which I will try to improve.
I don't think automatic handling of continuous distributions is simple
(because it is not clear how you would specify the reference
distribution). However, a little preliminary work will get you close
with the current implementation:
xnorm <- rnorm(1000)
## 'discretize' by binning and replacing data by bin midpoints
h <- hist(xnorm, plot = FALSE) # add arguments for more control
xdisc <- with(h, rep(mids, counts))
## Option 1: Assume bin probabilities proportional to dnorm()
norm.factor <- sum(dnorm(h$mids, mean(xnorm), sd(xnorm)))
rootogram(~ xdisc,
dfun = function(x) {
dnorm(x, mean(xnorm), sd(xnorm)) / norm.factor
})
## Option 2: Compute probabilities explicitly using pnorm()
## pdisc <- diff(pnorm(h$breaks)) ## or estimated:
pdisc <- diff(pnorm(h$breaks, mean = mean(xnorm), sd = sd(xnorm)))
pdisc <- pdisc / sum(pdisc)
rootogram(~ xdisc,
dfun = function(x) {
f <- factor(x, levels = h$mids)
pdisc[f]
})
-Deepayan