Survey weights. Suggestions?

2 messages · Paul Johnson, Robert Long

Fri, Oct 17, 2014 8:38 AM #

I'm still resisting the idea that we should incorporate survey weights
in regression analysis at all, and now it is suggested to me that a
mixed model with state & city random effects needs to incorporate
information about survey weights.  Could I hear your opinions?

In the past, I've always answered people who ask for survey weights
with this quotation:

Murray Aitkin, Brian Francis, John Hinde, and Ross Darnell,
Statistical Modeling in R (Oxford 2009), p. 112

"One point which often causes confusion is the use of 'sample weights'
in regression. Survey studies sometimes substantially over-sample
small strata or sub-populations to provide sample sizes similar to
those from (under-sampled) large sub-populations.  A 'sample weight'
is often provided for each observation in the sample data set to allow
the re-aggregation of the final model to provide population
predictions. The sample weight is the reciprocal of the probability of
inclusion in the sample of an observation from each sub-population.
The sample weight will be high for the large sub-populations, and low
for the small sub-populations.

These weights can be used formally to define a weighted or pseudo
likelihood for the sample wieght w_i for y_i, the weighted likelihood
is
[formula]
Then the weighted MLEs from the score equation satisfy
[formula]
If theta is the population mean and the model for Y is N(mu,
sigma-squared), the weighted MLE is [formula]. This correctly weights
for disproportionate sampling.

However, it is an important point that these sample weights should
*not* be used as formal weights in a regression analysis: the
observations should be equally weighted (i.e., unweighted) in the
analysis, and the model should always include the stratifying factor,
together with its interactions with other variables in the model....
"

It appears to me that is correct. I like the argument. It fits with my
understanding of

DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights
in Multiple Regression Analyses of Stratified Samples. Journal of the
American Statistical Association, 78(383), 535. doi:10.2307/2288115.
Summary: If you have a model specified correctly, you don't need
sampling weights. If the usage of weights leads to a different answer,
your model is probably wrong to start with. They make a specification
test out of the difference.

And then there's the all time classic comment "Survey weighting is a
mess." (Gelman, A. (2007). Struggles with Survey Weighting and
Regression Modeling. Statistical Science, 22(2), 153?164.
doi:10.1214/088342306000000691.
http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf)

I've read Thomas Lumley's book on using the R survey package, I
understand how I could estimate some GLM with survey weights.  But I
don't understand why I'd want to do that.  And I can't bring myself to
believe that weights can correct for non-response in panel studies
either.

I need to read something at the middle level, between a graduate
math-stats book on sampling theory, and a manual for SPSS users that
tells them which buttons to push.  Can you point me at some discussion
of where survey weights feed into a random effects framework, or good
reasons why we need to use survey weights at all?

Please note, I'm not reluctant about weights as an approach to
heteroskedasticity (WLS), I understand that part.  I believe I
understand the role of the weights argument in lmer as currently
presented.

pj

Paul E. Johnson
Professor, Political Science      Acting Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu

Robert Long

Fri, Oct 17, 2014 10:18 AM #

Perhaps one situation where survey weights could plausibly be used is 
where propensity scores are used as inverse probability weights to 
"create balance" when estimating a treatment effect in non-equivalent 
groups.

On 17/10/2014 16:38, Paul Johnson wrote:

I'm still resisting the idea that we should incorporate survey weights
in regression analysis at all, and now it is suggested to me that a
mixed model with state & city random effects needs to incorporate
information about survey weights.  Could I hear your opinions?

In the past, I've always answered people who ask for survey weights
with this quotation:

Murray Aitkin, Brian Francis, John Hinde, and Ross Darnell,
Statistical Modeling in R (Oxford 2009), p. 112

"One point which often causes confusion is the use of 'sample weights'
in regression. Survey studies sometimes substantially over-sample
small strata or sub-populations to provide sample sizes similar to
those from (under-sampled) large sub-populations.  A 'sample weight'
is often provided for each observation in the sample data set to allow
the re-aggregation of the final model to provide population
predictions. The sample weight is the reciprocal of the probability of
inclusion in the sample of an observation from each sub-population.
The sample weight will be high for the large sub-populations, and low
for the small sub-populations.

These weights can be used formally to define a weighted or pseudo
likelihood for the sample wieght w_i for y_i, the weighted likelihood
is
[formula]
Then the weighted MLEs from the score equation satisfy
[formula]
If theta is the population mean and the model for Y is N(mu,
sigma-squared), the weighted MLE is [formula]. This correctly weights
for disproportionate sampling.

However, it is an important point that these sample weights should
*not* be used as formal weights in a regression analysis: the
observations should be equally weighted (i.e., unweighted) in the
analysis, and the model should always include the stratifying factor,
together with its interactions with other variables in the model....
"

It appears to me that is correct. I like the argument. It fits with my
understanding of

DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights
in Multiple Regression Analyses of Stratified Samples. Journal of the
American Statistical Association, 78(383), 535. doi:10.2307/2288115.
Summary: If you have a model specified correctly, you don't need
sampling weights. If the usage of weights leads to a different answer,
your model is probably wrong to start with. They make a specification
test out of the difference.

And then there's the all time classic comment "Survey weighting is a
mess." (Gelman, A. (2007). Struggles with Survey Weighting and
Regression Modeling. Statistical Science, 22(2), 153?164.
doi:10.1214/088342306000000691.
http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf)

I've read Thomas Lumley's book on using the R survey package, I
understand how I could estimate some GLM with survey weights.  But I
don't understand why I'd want to do that.  And I can't bring myself to
believe that weights can correct for non-response in panel studies
either.

I need to read something at the middle level, between a graduate
math-stats book on sampling theory, and a manual for SPSS users that
tells them which buttons to push.  Can you point me at some discussion
of where survey weights feed into a random effects framework, or good
reasons why we need to use survey weights at all?

Please note, I'm not reluctant about weights as an approach to
heteroskedasticity (WLS), I understand that part.  I believe I
understand the role of the weights argument in lmer as currently
presented.

pj