Difficulty with qqline in logarithmic context
[Brian Ripley]
Is there a good reason to use qqnorm in a single-log context?
Yes. Googling around reveals this is not so uncommon.
Should one not rather use
qqnorm(log(freq)) qqline(log(freq))
In the display produced by "qqnorm", the y-axis would then show "log(value)" labels, while the user (me!) expects "value" labels.
since you are (I guess) looking at log-normality of freq?
Once again, I was merely toying with "qqplot". I found intriguing that, while shuffling messages around between folders, for a good while, the distribution of log(number of messages) per folder appears vagueley normal, as I do not quickly see a reasonable justification for this.
Another way to look at that is
qqplot(qlnorm(ppoints(length(freq))), freq, log="xy")
the same plot, different scales.
Interesting, thanks for teaching me about "ppoints". Yet, I stay more happy with the abcissa scale produced by "qqnorm". Besides, how would one uses "qqline" with the above?
(I believe a QQ plot should always have comparable scales on the two axes.)
While comparable scales are somewhat simpler to compare, this is not necessarily what is most adequate for the user. Proof is that while quantiles are being compared here, scales do not show quantiles, but units as meaningful to the user. One might want to compare variables scaled very differently, maybe because of different units from the same distribution, of from different but similar distributions using different scales and shifted to different means. Or even, why not, if this is what is meaningful for users, a log scale.
The point is that qqline is tied to normality, not to log-normality.
As it stands, yes. As a convenience, it could be extended (probably easily) to log-normality. "qqnorm" already does something sensible in log-context, so a user might expect "qqline" to do equally well. The real point might be that "qqline" is tied to "abline" a bit too blindly. What is the meaning of intercept and slope of a straight line on a graphic in log context? First, the intercept might not even exist. Second, "abline" interpretation depends on the clippling, and possibly on the extrema of the pretty breakpoints chosen for scales, so making it hard to predict on average use. There ought to be some reason for the log-aware code in "abline", yet I did not find documentation for it. The wisest for "abline", in my very humble opinion, would be for it to complain if ever called in log context. Then, "qqline" would indirectly complain through "abline", if "qqline" is not modified to do something more proper. Moreover, if it is definitely out of question that "qqline" be ever meaningfully called in log context, then so "qqnorm", which should then complain as well. Currently, "qqline" misbehaves, in that it silently produces a meaningless result, while it could either diagnose that the result is meaningless, or produce a mearningful result. [Remainder of the reply top-quoted, as usual on r-help.]
On Wed, 1 Feb 2006, Fran??ois Pinard wrote:
Hi, R friends. I had some difficulty with the following code:
qqnorm(freq, log='y') qqline(freq)
as the line drawn was seemingly random. The exact data I used appears below. After wandering a bit within the source code for "abline", I figured out I should rather write:
qqnorm(freq, log='y') par(ylog=FALSE) qqline(log10(freq)) par(ylog=TRUE)
I'm proposing that this little stunt be rather be hidden and
automatically effected within "qqline" proper, whenever par('ylog') is
TRUE. I thought about providing a patch, as "qqline" is so small. Yet
it would be more noise than useful, as I'm not familiar with the "datax"
argument usage, which should probably be addressed as well.
Here is the data, in case useful:
freq <- as.integer(c(33, 79, 21, 436, 58, 18, 1106, 498, 1567, 393, 2, 104, 50, 67, 113, 76, 327, 331, 196, 145, 86, 59, 12, 215, 293, 154, 500, 314, 246, 587, 85, 23, 323, 3, 13, 576, 29, 37, 24, 21, 1230, 137, 13, 93, 3, 101, 72, 218, 59, 17, 2, 8, 86, 143, 150, 22, 19, 234, 119, 157, 4, 255, 146, 126, 76, 15, 271, 170, 4, 6, 16, 3048, 2175, 3350, 5017, 5706, 1610, 665, 322, 1, 16, 47, 51, 168, 94, 66, 154, 99, 11, 547, 953, 1, 1071, 80, 184, 168, 52, 187, 103, 187, 361, 46, 85, 135, 597, 121, 283, 26, 12, 20, 169, 9, 79, 15, 114, 75, 30, 111, 556, 173, 32, 99, 438, 2, 2, 1, 117, 5, 3, 51, 8, 41, 12, 23, 2, 13, 5, 1, 9, 4, 1, 7, 15, 5, 48, 16, 112, 6, 1, 39, 60, 5, 23, 5, 19, 1, 8, 32, 4, 13, 1, 14, 71, 5, 1, 35, 30, 100, 389, 22, 8, 1, 192, 40, 6, 3, 17, 2, 14, 71, 14, 1, 5, 4, 32, 21, 18, 13, 2, 2, 45, 342, 46, 144, 18, 131, 188, 112, 37, 85, 90, 8, 195, 173, 5, 53, 96, 37, 16, 16, 281, 64, 50, 92, 336, 31, 744, 4, 134, 74, 1, 227, 6, 48, 418, 64, 66, 59, 20, 45, 20, 370, 148, 22, 7, 30, 601, 29, 82, 113, 938, 252, 65, 137, 72, 22, 98, 12, 152, 212, 13, 8, 35, 3, 77))
Yet this really is the value of "courriel$freq" after "data(courriel)", with a file ".../R/data/courriel.R" here, holding:
courriel <- read.table(pipe('grep -c \'^From \' ../courriel/*'),
sep=':', as.is=T, row.names=1,
col.names=c('fichier', 'freq'))
My goal, which is nothing serious, was merely to toy with the number of messages per folder, for folders massaged out of R archives.
Version: platform = i686-pc-linux-gnu arch = i686 os = linux-gnu system = i686, linux-gnu status = major = 2 minor = 2.1 year = 2005 month = 12 day = 20 svn rev = 36812 language = R
Locale: LC_CTYPE=fr_CA.UTF-8;LC_NUMERIC=C;LC_TIME=fr_CA.UTF-8;LC_COLLATE=fr_CA.UTF-8;LC_MONETARY=fr_CA.UTF-8;LC_MESSAGES=fr_CA.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
Search Path: .GlobalEnv, package:methods, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, fp.etc, Autoloads, package:base
-- Fran??ois Pinard http://pinard.progiciels-bpi.ca
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Fran??ois Pinard http://pinard.progiciels-bpi.ca