Dear R-users, Recently I had to analyze a dataset from household survey. The sample design ensured, that each household in the population has the same probability of being sampled. However the data were gathered from only one adult individual in each household, who was randomly choosen by an interviewer (via "Kish grid"). To equalize the probabilities for each INDIVIDUAL a casewise weighting factor is introduced. It is proportional to the reciprocal of the number of adults in the household and rescaled so it's sum equals the sample size. This weighting factor is neccessery to perform inferences for population of individuals. I had no problems with estimating models which use count data, because I could construct contingency tables with something like: tapply(weight, a.bunch.of.factors, sum) Unfortunately I couldn't come up with a good way of building other kinds of models for those data. Is there some way (apart for writing new functions from scratch) to perform modelling tasks like lm(), that will take the weights into account? (As far as I know there are only basic functions weighted.mean() and cov.wt() for weighted means and weighted covariance/correlation matrices respectively.) Thank you in advance for any suggestions. Michal ~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~ Michal Bojanowski Institute for Social Studies University of Warsaw Poland http://www.iss.uw.edu.pl -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
analysis of data with observation weights
9 messages · Michal Bojanowski, Peter Dalgaard, John Fox +2 more
Dear Michal, As far as I know (and I'd be happy to be wrong), there's no *general* way of introducing case weights in R. The glm function, however, accommodates case weights via its weights argument, and this might be sufficient to do what you want to do. You'll have to be careful with inferences, though. Perhaps someone else on the list can provide additional information. John
At 05:22 PM 11/14/2002 +0100, Michal Bojanowski wrote:
Recently I had to analyze a dataset from household survey. The sample design ensured, that each household in the population has the same probability of being sampled. However the data were gathered from only one adult individual in each household, who was randomly choosen by an interviewer (via "Kish grid"). To equalize the probabilities for each INDIVIDUAL a casewise weighting factor is introduced. It is proportional to the reciprocal of the number of adults in the household and rescaled so it's sum equals the sample size. This weighting factor is neccessery to perform inferences for population of individuals. I had no problems with estimating models which use count data, because I could construct contingency tables with something like: tapply(weight, a.bunch.of.factors, sum) Unfortunately I couldn't come up with a good way of building other kinds of models for those data. Is there some way (apart for writing new functions from scratch) to perform modelling tasks like lm(), that will take the weights into account? (As far as I know there are only basic functions weighted.mean() and cov.wt() for weighted means and weighted covariance/correlation matrices respectively.)
----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hello John,
Thursday, November 14, 2002, 9:07:51 PM, you wrote:
JF> Dear Michal, JF> As far as I know (and I'd be happy to be wrong), there's no *general* way JF> of introducing case weights in R. The glm function, however, accommodates JF> case weights via its weights argument, and this might be sufficient to do JF> what you want to do. You'll have to be careful with inferences, though. JF> Perhaps someone else on the list can provide additional information. JF> John Thank you for your answer professor Fox. I did perform an "experiment" (which follows) using 'weight' argument, but in lm() function. The help page states, that this argument should contain weights used in weighted regression fitting process. I dont feel strong in WLS I must say (to state it diplomatically) so I dont know if it is possible use 'weight' argument to solve my problem. I generated the data: x <- rep(c(1,2), c(6,4)) y <- rep(c(1,2,3,4),c(2,3,3,2)) # which look like cbind(x,y) # now I fit a model summary(m <- lm(y~x)) # now when I create "collapsed" data x1 <- rep(c(1,2), c(3,2)) y1 <- rep(c(1,2,3,4), c(1,1,2,1)) # with frequencies w <- c(2,3,1,2,2) # which look like cbind(x1,y1,w) # and fit a model summary(m1 <- lm(y1~x1, weight=w)) I'm gettin the same coefficients, but different standard errors. I guess this is what you had in mind. I guess I need a book on WLS... Thank you for the answer anyway. Michal ps. Also, I would like to thank you for your fine lecture about S/R in Ann Arbor this summer -- which I attended with great pleasure. ~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~ Michal Bojanowski Institute for Social Studies University of Warsaw Poland http://www.iss.uw.edu.pl -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Michal Bojanowski <bojaniss at poczta.onet.pl> writes:
I'm gettin the same coefficients, but different standard errors. I guess this is what you had in mind. I guess I need a book on WLS... Thank you for the answer anyway.
Thomas Lumley once did a brief but very good writeup on the various kinds of weighting. I forget whether it was for one of the open mailing lists or in connection with a discussion in R-core. One thing I remember from it was the need to distinguish between the various reasons for weighting. The one used in lm/glm is based on the idea that some measurements are more precise than others and therefore deserve more weight, so basically the weight is the inverse variance of an observation. However, you might want to weight observations differently even if their variance is the same, e.g. to obtain a method that is stable against differences in population structure, even if the model is slightly wrong. (Some rather subtle issues are involved here and I'm not sure I'm representing them adequately.)
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear Peter and Michal, I was under the impression that the weights argument in lm specifies inverse-variance weights, but that the weights argument in glm specifies case weights. Inverse-variance weights, which produce a WLS solution, are inappropriate for Michal's problem. I checked and now see that the weights arguments for both lm and glm are inverse-variance weights, so the procedure that I suggested was incorrect. Sorry, John
At 11:48 PM 11/14/2002 +0100, Peter Dalgaard BSA wrote:
Michal Bojanowski <bojaniss at poczta.onet.pl> writes:
I'm gettin the same coefficients, but different standard errors. I
guess this is
what you had in mind. I guess I need a book on WLS... Thank you for the answer anyway.
Thomas Lumley once did a brief but very good writeup on the various kinds of weighting. I forget whether it was for one of the open mailing lists or in connection with a discussion in R-core. One thing I remember from it was the need to distinguish between the various reasons for weighting. The one used in lm/glm is based on the idea that some measurements are more precise than others and therefore deserve more weight, so basically the weight is the inverse variance of an observation. However, you might want to weight observations differently even if their variance is the same, e.g. to obtain a method that is stable against differences in population structure, even if the model is slightly wrong. (Some rather subtle issues are involved here and I'm not sure I'm representing them adequately.)
----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 14 Nov 2002, John Fox wrote:
Dear Michal, As far as I know (and I'd be happy to be wrong), there's no *general* way of introducing case weights in R. The glm function, however, accommodates case weights via its weights argument, and this might be sufficient to do what you want to do. You'll have to be careful with inferences, though.
The weights argument to lm and glm will give the right point estimates. The standard errors will potentially be wrong. This can be fixed with `sandwich' standard errors, so one option is to use gee() with each observation being in a `group' on its own. Similarly, the `robust' standard errors in coxph() will allow probability-weighted survival analyses. The sandwich standard errors used by gee() are not quite the same as the ones used by survey samplers, but they are very similar and they are consistent estimates of the same thing. The usual linear model standard errors are often pretty good even for probability weighting as long as important covariates aren't strongly associated with the weights. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hello Thomas,
Friday, November 15, 2002, 4:35:13 PM, you wrote:
TL> The weights argument to lm and glm will give the right point estimates. TL> The standard errors will potentially be wrong. This can be fixed with TL> `sandwich' standard errors, so one option is to use gee() with each TL> observation being in a `group' on its own. Similarly, the `robust' TL> standard errors in coxph() will allow probability-weighted survival TL> analyses. TL> The sandwich standard errors used by gee() are not quite the same as the TL> ones used by survey samplers, but they are very similar and they are TL> consistent estimates of the same thing. TL> The usual linear model standard errors are often pretty good even for TL> probability weighting as long as important covariates aren't strongly TL> associated with the weights. TL> -thomas Where can I find the gee() function, it's not in base package nor in any packages I have installed. I use R 1.5.1 Thank you. ~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~`~,~ Michal Bojanowski mailto:mbojanowski at samba.iss.uw.edu.pl Polish General Social Survey Institute for Social Studies University of Warsaw http://www.iss.uw.edu.pl/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Surprisingly enough it is in the gee package.
On Fri, 15 Nov 2002, bojaniss wrote:
Where can I find the gee() function, it's not in base package nor in any packages I have installed. I use R 1.5.1
I suggest upgrading.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 15 Nov 2002, bojaniss wrote:
Hello Thomas, Friday, November 15, 2002, 4:35:13 PM, you wrote: TL> The weights argument to lm and glm will give the right point estimates. TL> The standard errors will potentially be wrong. This can be fixed with TL> `sandwich' standard errors, so one option is to use gee() with each TL> observation being in a `group' on its own. Similarly, the `robust' TL> standard errors in coxph() will allow probability-weighted survival TL> analyses. TL> The sandwich standard errors used by gee() are not quite the same as the TL> ones used by survey samplers, but they are very similar and they are TL> consistent estimates of the same thing. TL> The usual linear model standard errors are often pretty good even for TL> probability weighting as long as important covariates aren't strongly TL> associated with the weights. TL> -thomas Where can I find the gee() function, it's not in base package nor in any packages I have installed.
In the gee package. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._