large survey data set
On Thu, 27 Jun 2002, Andrew Perrin wrote:
The lm function (for linear modelling aka linear regression) includes case weights with a simple syntax: foo<-lm(dependent ~ indep + indep + ... , data = <data object>, weights = <weight variable>)
Yes, but that isn't what he means by weights... The standard regression weights are variance weights: a weight of 2 denotes an observation with half the variance of a weight of 1. In survey sampling (and in related missing data and causal inference models) you need probability weights: a weight of 2 means an observation had half the chance of being sampled. You get the same regression coefficients (more or less) but quite different standard errors. The `model-robust' sandwich variance estimators give about the right standard errors (as long as the sampling fraction is small). These are built in to the survival models, but not in most other software. They are pretty easy to calculate but with a 20% sample they probably aren't going to work well. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._