An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120223/ee1f3330/attachment.pl>
Case weighting
6 messages · Hed Bar-Nissan, David Winsemius, Daniel Nordlund +1 more
On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote:
The need comes from the PISA data. (http://www.pisa.oecd.org) In the data there are many cases and each of them carries a numeric variable that signifies it's weight. In SPSS the command would be "WEIGHT BY" In simpler words here is an R sample ( What is get VS what i want to get )
data.recieved <- data.frame(
+ kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",
"No")),
+ weight=c(10, 1, 1, 1)
+ );
data.recieved;
kindergarten_attendance weight 1 No 10 2 Yes 1 3 Yes 1 4 Yes 1
data.weighted <- data.frame(
+ kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,1,1,1),
labels =
c("Yes", "No")) );
You want "case repetition" not case weighting, which I would use as a term when working on estimation problems: > ( data.weighted <- unlist(sapply(1:NROW(data.recieved), function(x) rep(data.recieved[x,1], times=data.recieved[x,2] )) ) ) [1] No No No No No No No No No No Yes Yes Yes Levels: Yes No
par(mfrow=c(1,2)); plot(data.recieved$kindergarten_attendance,main="What i get"); plot(data.weighted$kindergarten_attendance,main="What i want to get");
Seems to work with the factor vector, although I didn't replicate dataframe rows, but I guess you could.
tnx in advance Hed [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120223/9fb2ef9f/attachment.pl>
On Feb 23, 2012, at 3:27 PM, Hed Bar-Nissan wrote:
It's really weighting - it's just that my simplified example was too simplified Here is my real weight vector:
sc$W_FSCHWT
[1] 14.8579 61.9528 3.0420 2.9929 5.1239 14.7507 2.7535 2.2693 3.6658 8.6179 2.5926 2.5390 1.7354 2.9767 9.0477 2.6589 3.4040 3.0519 ....
You should always convey the necessary complexity of the problem.
And still it should somehow set the case weight. I could multiply all by 10000 and use maybe your method but it would create such a bloated dataframe working with numeric only i could probably create weighted means But something simple as WEIGHTED BY would be nice.
The survey package by Thomas Lumley provides for a wide variety of weighted analyses.
David. > > tnx > Hed > > > > > > On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius <dwinsemius at comcast.net > > wrote: > > On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote: > > The need comes from the PISA data. (http://www.pisa.oecd.org) > > In the data there are many cases and each of them carries a numeric > variable that signifies it's weight. > In SPSS the command would be "WEIGHT BY" > > In simpler words here is an R sample ( What is get VS what i want > to get ) > > > data.recieved <- data.frame( > + kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes", > "No")), > + weight=c(10, 1, 1, 1) > + ); > data.recieved; > kindergarten_attendance weight > 1 No 10 > 2 Yes 1 > 3 Yes 1 > 4 Yes 1 > > > > data.weighted <- data.frame( > + kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,1,1,1), > labels = > c("Yes", "No")) ); > > You want "case repetition" not case weighting, which I would use as > a term when working on estimation problems: > > > ( data.weighted <- unlist(sapply(1:NROW(data.recieved), > function(x) rep(data.recieved[x,1], times=data.recieved[x,2] )) ) ) > [1] No No No No No No No No No No Yes Yes Yes > Levels: Yes No > > > > > par(mfrow=c(1,2)); > plot(data.recieved$kindergarten_attendance,main="What i get"); > plot(data.weighted$kindergarten_attendance,main="What i want to get"); > > Seems to work with the factor vector, although I didn't replicate > dataframe rows, but I guess you could. > > > > tnx in advance > Hed > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > David Winsemius, MD West Hartford, CT
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Hed Bar-Nissan Sent: Thursday, February 23, 2012 12:27 PM To: David Winsemius Cc: r-help at r-project.org Subject: Re: [R] Case weighting It's really weighting - it's just that my simplified example was too simplified Here is my real weight vector:
sc$W_FSCHWT
[1] 14.8579 61.9528 3.0420 2.9929 5.1239 14.7507 2.7535 2.2693 3.6658 8.6179 2.5926 2.5390 1.7354 2.9767 9.0477 2.6589 3.4040 3.0519 .... And still it should somehow set the case weight. I could multiply all by 10000 and use maybe your method but it would create such a bloated dataframe working with numeric only i could probably create weighted means But something simple as WEIGHTED BY would be nice. tnx Hed On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius <dwinsemius at comcast.net>wrote:
On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote: The need comes from the PISA data. (http://www.pisa.oecd.org)
In the data there are many cases and each of them carries a numeric variable that signifies it's weight. In SPSS the command would be "WEIGHT BY" In simpler words here is an R sample ( What is get VS what i want to get ) data.recieved <- data.frame(
+ kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",
"No")),
+ weight=c(10, 1, 1, 1) + );
data.recieved;
kindergarten_attendance weight 1 No 10 2 Yes 1 3 Yes 1 4 Yes 1
data.weighted <- data.frame(
+ kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,**1,1,1),
labels =
c("Yes", "No")) );
You want "case repetition" not case weighting, which I would use as a
term
when working on estimation problems:
( data.weighted <- unlist(sapply(1:NROW(data.**recieved), function(x)
rep(data.recieved[x,1], times=data.recieved[x,2] )) ) ) [1] No No No No No No No No No No Yes Yes Yes Levels: Yes No
par(mfrow=c(1,2)); plot(data.recieved$**kindergarten_attendance,main="**What i get"); plot(data.weighted$**kindergarten_attendance,main="**What i want to get");
Seems to work with the factor vector, although I didn't replicate dataframe rows, but I guess you could.
Are these survey sampling weights? If so, then you need to be using procedures that take the sampling design into account. Otherwise, your variance estimates are going to be all wrong. Dan Daniel Nordlund Bothell, WA USA
On Fri, Feb 24, 2012 at 9:40 AM, David Winsemius <dwinsemius at comcast.net> wrote: On Feb 23, 2012, at 3:27 PM, Hed Bar-Nissan wrote:
It's really weighting - it's just that my simplified example was too simplified Here is my real weight vector:
sc$W_FSCHWT
?[1] ?14.8579 ?61.9528 ? 3.0420 ? 2.9929 ? 5.1239 ?14.7507 ? 2.7535 2.2693 ? 3.6658 ? 8.6179 ? 2.5926 ? 2.5390 ? 1.7354 ? 2.9767 ? 9.0477 2.6589 ? 3.4040 ? 3.0519 ....
You should always convey the necessary complexity of the problem.
And still it should somehow set the case weight. I could multiply all by 10000 and use maybe your method but it would create such a bloated dataframe working with numeric only i could probably create weighted means But something simple as WEIGHTED BY would be nice.
The survey package by Thomas Lumley provides for a wide variety of weighted analyses.
Yes. It doesn't do everything that SPSS WEIGHTED BY will do, but it does a lot. SPSS is more general partly because it cheats -- it doesn't always compute the right standard errors if the weights are sampling weights [SPSS now has some proper survey analysis commands, which do get the right standard errors, but are more limited] - thomas
Thomas Lumley Professor of Biostatistics University of Auckland