Simple lm/regression question
On Feb 6, 2012, at 10:57 , Achim Zeileis wrote:
On Mon, 6 Feb 2012, James Annan wrote: The summary() shows under "Residuals" the contributions to the objective function, i.e. sqrt(1/w) (y - x'b) in the notation above. However, by using the residuals extractor function you can get the unweighted residuals: residuals(lm(y~x,weights=c(.01,.01,.01,.01)))
The uncertainties on the parameter estimates, however, have *not* changed, which seems very odd to me.
lm() interprets the weights as precision weights, not as case weights. Thus, the scaling in the variances is done by the number of (non-zero) weights, not by the sum of weights.
The behaviour of IDL is rather different and intuitive to me:
IDL> vec=linfit(x,y,sigma=sig,measure_errors=[1,1,1,1])
IDL> print,vec,sig
-5.00000 5.00000
1.22474 0.447214
IDL> vec=linfit(x,y,sigma=sig,measure_errors=[10,10,10,10])
IDL> print,vec,sig
-5.00000 5.00000
12.2474 4.47214
This appears to use sandwich standard errors.
Actually, I think the issue is slightly different: IDL assumes that the errors _are_ something (notice that setting measure_errors to 1 is not equvalent to omitting them), R assumes that they are _proportional_ to the inverse weights, and proportionality to c(.01,.01,.01,.01) is not different from proportionality to c(1,1,1,1)... There are a couple of ways to avoid the use of the estimated multiplicative dispersion parameter in R, one is to extract cov.unscaled from the summary, another is to use summary.glm with dispersion=1, but I'm not quite sure how they interact with weights (and I don't have the time to check just now.)
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com