problem with zero-weighted observations in predict.lm?
On Thu, 29 Jul 2010, Peter Dalgaard wrote:
Peter Dalgaard wrote:
William Dunlap wrote:
In modelling functions some people like to use a weight of 0 to drop an observation instead of using a subset value of FALSE. E.g., weights=c(0,1,1,...) instead of subset=c(FALSE, TRUE, TRUE, ...) to drop the first observation. lm() and summary.lm() appear to treat these in the same way, decrementing the number of degrees of freedom for each dropped observation. However, predict.lm() does not treat them the same. It doesn't seem to diminish the df to account for the 0-weighted observations. E.g., the last printout from the following script is as follows, where predw is the prediction from the fit that used 0-weights and preds is from using FALSE's in the subset argument. Is this difference proper?
Nice catch. The issue is that the subset fit and the zero-weighted fit are not completely the same. Notice that the residuals vector has different length in the two analyses. With a simplified setup:
length(lm(y~1,weights=w)$residuals)
[1] 10
length(lm(y~1,subset=-1)$residuals)
[1] 9
w
[1] 0 1 1 1 1 1 1 1 1 1 This in turn is what confuses predict.lm because it gets n and residual df from length(object$residuals). summary.lm() uses NROW(Qr$qr), and I suppose that predict.lm should follow suit.
...and then when I went to fix it, I found that the actual line in the sources (stats/R/lm.R) reads 27442 ripley n <- length(object$residuals) # NROW(object$qr$qr) so it's been like that since December 2003. I wonder if Brian remembers what the point was? (27442 was the restructuring into the stats package, so it might not actually be Brian's code).
At least that wasn't the point of change: the code was the same in R 1.8.1 (pre-split). I think you will find that 'n' is used in several ways in predict.lm, and since NA-handling was introduced in R 1.8.0 they may differ in value. So the safest route seems to be to change just 'n' in df <- n - p
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595