I am struiggling a bit with this function 'hatvalues'. I would like a little more undrestanding than taking the black-box and using the values. I looked at the Fortran source and it is quite opaque to me. So I am asking for some help in understanding the theory. First, I take the simplest case of a single variant. For this I turn o John Fox's book, "Applied Regression Analysis and Generalized Linear Models, p 245 and generate this 'R' code:
library(car) attach(Davis)
# remove the NA's
narepwt <- repwt[!is.na(repwt)] meanrw <- mean(narepwt) drw <- narepwt - meanrw ssrw <- sum(drw * drw) h <- 1/length(narepwt) + (drw * drw)/ssrw h
This gives me a array of values the largest of which is
order(h, decreasing=TRUE)
[1] 21 52 17 93 30 62 158 113 175 131 182 29 106 125 123 146 91 99 So the largest "hatvalue" is
h[21]
[1] 0.1041207
Which doesn't match the 0.714 value that is reported in the book but I will probably take that up with the author later.
Then I use more of 'R' and I get
fit <- lm(weight ~ repwt)
hr <- hatvalues(fit)
hr[21]
21
0.1041207
So this matches which is reasusing. My question is this, given the QR transformation and the residuals derived from that transformation what is a simple matrix formula for the hatvalues?
residuals = y - Hy = y(I - H) or H = -(residuals/y - I)
fit <- lm(weight ~ repwt) h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - diag(1,length(residuals(fit)), length(residuals(fit))))
This generates a matrix but I cannot see any coerrelation between this "hat-matrix" and the return from "hatvalues". Comments? Thank you. Kevin