hatvalues?
Dear Kevin, If you do the same regression as in the text then you'll get the same hat-values; the regression is the one on the top of p. 244:
mod <- lm(repwt ~ weight*sex, data=Davis) max(hatvalues(mod))
[1] 0.7141856 As to making sense of the computations:
X <- model.matrix(mod) head(X)
(Intercept) weight sexM weight:sexM 1 1 77 1 77 2 1 58 0 0 3 1 53 0 0 4 1 68 1 68 5 1 59 0 0 6 1 76 1 76
H <- X %*% solve( t(X) %*% X ) %*% t(X) h <- diag(H) max(h)
[1] 0.7141856 I hope this helps, John
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of rkevinburton at charter.net Sent: March-05-09 11:40 AM To: r-help at stat.math.ethz.ch Subject: [R] hatvalues? I am struiggling a bit with this function 'hatvalues'. I would like a
little
more undrestanding than taking the black-box and using the values. I
looked
at the Fortran source and it is quite opaque to me. So I am asking for
some
help in understanding the theory. First, I take the simplest case of a
single
variant. For this I turn o John Fox's book, "Applied Regression Analysis
and
Generalized Linear Models, p 245 and generate this 'R' code:
library(car) attach(Davis)
# remove the NA's
narepwt <- repwt[!is.na(repwt)] meanrw <- mean(narepwt) drw <- narepwt - meanrw ssrw <- sum(drw * drw) h <- 1/length(narepwt) + (drw * drw)/ssrw h
This gives me a array of values the largest of which is
order(h, decreasing=TRUE)
[1] 21 52 17 93 30 62 158 113 175 131 182 29 106 125 123 146 91
99
So the largest "hatvalue" is
h[21]
[1] 0.1041207 Which doesn't match the 0.714 value that is reported in the book but I
will
probably take that up with the author later.
Then I use more of 'R' and I get
fit <- lm(weight ~ repwt)
hr <- hatvalues(fit)
hr[21]
21
0.1041207
So this matches which is reasusing. My question is this, given the QR
transformation and the residuals derived from that transformation what is
a
simple matrix formula for the hatvalues?
residuals = y - Hy = y(I - H) or H = -(residuals/y - I)
fit <- lm(weight ~ repwt) h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] -
diag(1,length(residuals(fit)), length(residuals(fit)))) This generates a matrix but I cannot see any coerrelation between this
"hat-
matrix" and the return from "hatvalues". Comments? Thank you. Kevin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.