Case weighting

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120223/ee1f3330/attachment.pl>

The need comes from the PISA data. (http://www.pisa.oecd.org)

In the data there are many cases and each of them carries a numeric
variable that signifies it's weight.
In SPSS the command would be "WEIGHT BY"

In simpler words here is an R sample ( What is get  VS  what i want  
to get )

data.recieved <- data.frame(
+ kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",  
"No")),
+ weight=c(10, 1, 1, 1)
+ );
data.recieved;
 kindergarten_attendance weight
1                      No     10
2                     Yes      1
3                     Yes      1
4                     Yes      1

data.weighted <- data.frame(
+ kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,1,1,1),  
labels =
c("Yes", "No")) );
You want "case repetition" not case weighting, which I would use as a  
term when working on estimation problems:

 >  ( data.weighted <- unlist(sapply(1:NROW(data.recieved),  
function(x) rep(data.recieved[x,1], times=data.recieved[x,2] ))  ) )
  [1] No  No  No  No  No  No  No  No  No  No  Yes Yes Yes
Levels: Yes No

par(mfrow=c(1,2));
plot(data.recieved$kindergarten_attendance,main="What i get");
plot(data.weighted$kindergarten_attendance,main="What i want to  
get");
Seems to work with the factor vector, although I didn't replicate  
dataframe rows, but I guess you could.

tnx in advance
Hed

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120223/9fb2ef9f/attachment.pl>

It's really weighting - it's just that my simplified example was too  
simplified
Here is my real weight vector:
sc$W_FSCHWT
  [1]  14.8579  61.9528   3.0420   2.9929   5.1239  14.7507    
2.7535   2.2693   3.6658   8.6179   2.5926   2.5390   1.7354    
2.9767   9.0477   2.6589   3.4040   3.0519
....
You should always convey the necessary complexity of the problem.

And still it should somehow set the case weight.
I could multiply all by 10000 and use maybe your method but it would  
create such a bloated dataframe

working with numeric only i could probably create weighted means

But something simple as WEIGHTED BY would be nice.
The survey package by Thomas Lumley provides for a wide variety of  
weighted analyses.
David.
>
> tnx
> Hed
>
>
>
>
>
> On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote:
>
> The need comes from the PISA data. (http://www.pisa.oecd.org)
>
> In the data there are many cases and each of them carries a numeric
> variable that signifies it's weight.
> In SPSS the command would be "WEIGHT BY"
>
> In simpler words here is an R sample ( What is get  VS  what i want  
> to get )
>
>
> data.recieved <- data.frame(
> + kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",  
> "No")),
> + weight=c(10, 1, 1, 1)
> + );
> data.recieved;
>  kindergarten_attendance weight
> 1                      No     10
> 2                     Yes      1
> 3                     Yes      1
> 4                     Yes      1
>
>
>
> data.weighted <- data.frame(
> + kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,1,1,1),  
> labels =
> c("Yes", "No")) );
>
> You want "case repetition" not case weighting, which I would use as  
> a term when working on estimation problems:
>
> >  ( data.weighted <- unlist(sapply(1:NROW(data.recieved),  
> function(x) rep(data.recieved[x,1], times=data.recieved[x,2] ))  ) )
>  [1] No  No  No  No  No  No  No  No  No  No  Yes Yes Yes
> Levels: Yes No
>
>
>
>
> par(mfrow=c(1,2));
> plot(data.recieved$kindergarten_attendance,main="What i get");
> plot(data.weighted$kindergarten_attendance,main="What i want to get");
>
> Seems to work with the factor vector, although I didn't replicate  
> dataframe rows, but I guess you could.
>
>
>
> tnx in advance
> Hed
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Hed Bar-Nissan
Sent: Thursday, February 23, 2012 12:27 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] Case weighting

It's really weighting - it's just that my simplified example was too
simplified
Here is my real weight vector:
sc$W_FSCHWT
  [1]  14.8579  61.9528   3.0420   2.9929   5.1239  14.7507   2.7535
2.2693   3.6658   8.6179   2.5926   2.5390   1.7354   2.9767   9.0477
2.6589   3.4040   3.0519
....

And still it should somehow set the case weight.
I could multiply all by 10000 and use maybe your method but it would
create
such a bloated dataframe

working with numeric only i could probably create weighted means

But something simple as WEIGHTED BY would be nice.

tnx
Hed

On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius
<dwinsemius at comcast.net>wrote:

On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote:

 The need comes from the PISA data. (http://www.pisa.oecd.org)
In the data there are many cases and each of them carries a numeric
variable that signifies it's weight.
In SPSS the command would be "WEIGHT BY"

In simpler words here is an R sample ( What is get  VS  what i want to
get )

 data.recieved <- data.frame(

+ kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes",
"No")),
+ weight=c(10, 1, 1, 1)
+ );

data.recieved;

 kindergarten_attendance weight
1                      No     10
2                     Yes      1
3                     Yes      1
4                     Yes      1

data.weighted <- data.frame(

+ kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,**1,1,1),
labels =
c("Yes", "No")) );

You want "case repetition" not case weighting, which I would use as a
term
when working on estimation problems:

 ( data.weighted <- unlist(sapply(1:NROW(data.**recieved), function(x)
rep(data.recieved[x,1], times=data.recieved[x,2] ))  ) )
 [1] No  No  No  No  No  No  No  No  No  No  Yes Yes Yes
Levels: Yes No

par(mfrow=c(1,2));
plot(data.recieved$**kindergarten_attendance,main="**What i get");
plot(data.weighted$**kindergarten_attendance,main="**What i want to
get");

Seems to work with the factor vector, although I didn't replicate
dataframe rows, but I guess you could.

Are these survey sampling weights?  If so, then you need to be using procedures that take the sampling design into account.  Otherwise, your variance estimates are going to be all wrong.

Dan

Daniel Nordlund
Bothell, WA USA
On Fri, Feb 24, 2012 at 9:40 AM, David Winsemius <dwinsemius at comcast.net> wrote:

On Feb 23, 2012, at 3:27 PM, Hed Bar-Nissan wrote:

It's really weighting - it's just that my simplified example was too
simplified
Here is my real weight vector:
sc$W_FSCHWT
?[1] ?14.8579 ?61.9528 ? 3.0420 ? 2.9929 ? 5.1239 ?14.7507 ? 2.7535
2.2693 ? 3.6658 ? 8.6179 ? 2.5926 ? 2.5390 ? 1.7354 ? 2.9767 ? 9.0477
2.6589 ? 3.4040 ? 3.0519
....

You should always convey the necessary complexity of the problem.

And still it should somehow set the case weight.
I could multiply all by 10000 and use maybe your method but it would
create such a bloated dataframe

working with numeric only i could probably create weighted means

But something simple as WEIGHTED BY would be nice.

The survey package by Thomas Lumley provides for a wide variety of weighted
analyses.
Yes.  It doesn't do everything that SPSS WEIGHTED BY will do, but it
does a lot.  SPSS is more general partly because it cheats -- it
doesn't always compute the right standard errors if the weights are
sampling weights   [SPSS now has some proper survey analysis commands,
which do get the right standard errors, but are more limited]

  - thomas
Thomas Lumley
Professor of Biostatistics
University of Auckland