-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I've been looking fairly carefully at the Q-Q plots produced by
plot.lm() and am having difficulty understanding why plot.lm()
is doing what it's doing, specifically scaling the standardized
residuals by the prior weights. Can anyone explain this to me ... ?
Multiplying by the weights seems to give the wrong plot, at least for
binomial models; at the very least, it means that the $y$ (observed
quantiles) and $x$ (expected quantiles) scales are different from each
other even when the model is exactly correct ... although if the
weights are all the same, it doesn't change the linearity of the
Q-Q plot, but it does seem confusing if one is explicitly comparing
expected to observed ...
The help page says
The ?S-L?, the Q-Q, and the Residual-Leverage plot, use
_standardized_ residuals which have identical variance (under the
hypothesis). They are given as R[i] / (s * sqrt(1 - h.ii)) where
h.ii are the diagonal entries of the hat matrix
An example (in R markdown):
```{r}
set.seed(101)
n <- 1000 ## number of observations
N <- 100 ## binomial sample size
d <- data.frame(r=rbinom(n,size=N,prob=0.5),
num=N)
## fit with proportion/weights syntax
g1 <- glm(r/N~1,family=binomial,weights=num,data=d)
```
The scale of the standardized residuals looks appropriate:
```{r}
range(rstandard(g1))
```
Showing the results of `plot.lm()`, `qqnorm()` on standardized
residuals, and th enhanced Q-Q plot from the `mgcv` package:
```{r fig.width=10,fig.height=5}
par(mfrow=c(1,3),las=1,bty="l")
plot(g1,which=2,main="plot.lm()")
qqnorm(rstandard(g1),main="qqnorm()")
mgcv::qq.gam(g1,pch=1,main="mgcv::qq.gam()")
```
`rstandard()` uses
```{r eval=FALSE}
res <- res/sqrt(summary(model)$dispersion * (1 - infl$hat))
```
`plot.lm()` uses
```{r eval=FALSE}
r <- residuals(x)
## ... some lines skipped ...
r.w <- if (is.null(w)) {
r
} else sqrt(w) * r
rs <- dropInf(r.w/(s * sqrt(1 - hii)), hii)
```
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJSXKByAAoJEOCV5YRblxUH7GIIANdaisPNlYAM65BRJc+U62uQ
gdNfv7xnGhReVRvngkXWCiFWT/54WdJfODPe/gk9uHOfDaQjvyCkicDdmMwq/r5H
2IKGV/1vh9biJr4c89tr3I13Y7hpLcEU5LC+uCSbbsAFBxgmiyNRbZZolA6JI9Vx
ty4QBCSOLcFROmygr1MiLZ4S9DtK7J1s4ToT+iBCHuIT6C+qxc3YrF1NusV1kbpy
IBIRvK44JZJ+1pPrNSfP5v59pZWy9fCyokhX9Rsygx6CSD75YYcQwZ+SHJD7gZ2d
Ev2XPG1qzUeAjpZId/T08Gir7VcbMdeLNIRVIiAAgxTSwmJpWkapsBEgzrE+IDM=
=XOSY
-----END PGP SIGNATURE-----
I've been looking fairly carefully at the Q-Q plots produced by
plot.lm() and am having difficulty understanding why plot.lm()
is doing what it's doing, specifically scaling the standardized
residuals by the prior weights. Can anyone explain this to me ... ?
Because with ideal choice of prior weights the scaled residuals are expected to be IID Normal (under the normality assumption for a linear model) and without scaling they aren't IID, so a Q-Q plot would be meaningless without scaling.
S Ellison
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}