glm(weights) and standard errors

Weighting can be confusing: There are three standard forms of weighting which you need to be careful not to mix up, and I suspect that the imputation weights are really a 4th version. 

First, there is case (replication) vs. precision weighting. A weight of 10 means one of

- I have 10 observations identical to this one
- This observation has a variance of sigma^2/10 as if it were the average of 10 observations.

There are also sampling weights:

- For each observation like this, I have 10 similar observations in the population (and I want to estimate a population parameter like the national average income or the percentage of votes at a hypothetical general election). 

What R does in lm/glm is precision weights. Notice that when the variance is estimated from data, the weights are really only relative: if all observations are weighted equally (all 10, say), you get a 10-fold increase in the estimated sigma^2 and a tenfold decrease in the unscaled variance-covariance matrix. So the net result is that the standard errors are the same (but they won't be if the weights are unequal).

The three weighting schemes share the same formula for the estimates, but differ both in the estimated variance and df, and in the formula for the standard errors. 

Sampling weights are the domain of the survey package, but I don't think it does replication weights (someone called Thomas may chime in and educate me otherwise). I'm not quite sure, but I think you can get from a precision-weighted analysis to a case-weighted one just by adjusting the DF for error (changing the residual df to df+sum(w)-n, and sigma^2 proportionally).

Imputation weights look like the opposite of case weights: You give 10 observations when in fact you have only one. An educated guess would be that you could do something similar as for case weights -- in this case sum(w) will be much less than n, so you will decrease the residual rather than increase it. I get this nagging feeling that it might still not be quite right, though -- in the cases where the imputations actually differ, do we get the extra variability of the variance right? Or maybe we don't need to care. There is a literature on the subject....

glm(weights) and standard errors

Thread (6 messages)