Survey weights. Suggestions?
Perhaps one situation where survey weights could plausibly be used is where propensity scores are used as inverse probability weights to "create balance" when estimating a treatment effect in non-equivalent groups.
On 17/10/2014 16:38, Paul Johnson wrote:
I'm still resisting the idea that we should incorporate survey weights in regression analysis at all, and now it is suggested to me that a mixed model with state & city random effects needs to incorporate information about survey weights. Could I hear your opinions? In the past, I've always answered people who ask for survey weights with this quotation: Murray Aitkin, Brian Francis, John Hinde, and Ross Darnell, Statistical Modeling in R (Oxford 2009), p. 112 "One point which often causes confusion is the use of 'sample weights' in regression. Survey studies sometimes substantially over-sample small strata or sub-populations to provide sample sizes similar to those from (under-sampled) large sub-populations. A 'sample weight' is often provided for each observation in the sample data set to allow the re-aggregation of the final model to provide population predictions. The sample weight is the reciprocal of the probability of inclusion in the sample of an observation from each sub-population. The sample weight will be high for the large sub-populations, and low for the small sub-populations. These weights can be used formally to define a weighted or pseudo likelihood for the sample wieght w_i for y_i, the weighted likelihood is [formula] Then the weighted MLEs from the score equation satisfy [formula] If theta is the population mean and the model for Y is N(mu, sigma-squared), the weighted MLE is [formula]. This correctly weights for disproportionate sampling. However, it is an important point that these sample weights should *not* be used as formal weights in a regression analysis: the observations should be equally weighted (i.e., unweighted) in the analysis, and the model should always include the stratifying factor, together with its interactions with other variables in the model.... " It appears to me that is correct. I like the argument. It fits with my understanding of DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples. Journal of the American Statistical Association, 78(383), 535. doi:10.2307/2288115. Summary: If you have a model specified correctly, you don't need sampling weights. If the usage of weights leads to a different answer, your model is probably wrong to start with. They make a specification test out of the difference. And then there's the all time classic comment "Survey weighting is a mess." (Gelman, A. (2007). Struggles with Survey Weighting and Regression Modeling. Statistical Science, 22(2), 153?164. doi:10.1214/088342306000000691. http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf) I've read Thomas Lumley's book on using the R survey package, I understand how I could estimate some GLM with survey weights. But I don't understand why I'd want to do that. And I can't bring myself to believe that weights can correct for non-response in panel studies either. I need to read something at the middle level, between a graduate math-stats book on sampling theory, and a manual for SPSS users that tells them which buttons to push. Can you point me at some discussion of where survey weights feed into a random effects framework, or good reasons why we need to use survey weights at all? Please note, I'm not reluctant about weights as an approach to heteroskedasticity (WLS), I understand that part. I believe I understand the role of the weights argument in lmer as currently presented. pj