Problems with weight

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121123/76de3041/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121123/ea770d46/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121127/06be26e3/attachment.pl>
Le mardi 27 novembre 2012 ? 18:33 -0300, Pablo Menese a ?crit :
I can't ... I don't know why but I can't

When I use it:

logit <- glm(bach ~ egp4 + programa, weight=wst7,
family=quasibinomial(link"logit"))
You were advised to use svyglm(), not glm(). It's usually considered
polite to read carefully the anwsers you get to your questions...

Regards
I reach the same betas that in STATA, but the hypothesis test, the t value,
and the std. error is different.

I think that the solution can't be so far from this...

On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico <ajdamico at gmail.com> wrote:

from your stata output, it looks like you need to use the survey package
in R

for step-by-step instructions about how to do this (and comparisons to
stata), see

http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf

once you're ready to run the regression, use svyglm() instead of glm() and
drop the weights argument (since it will already be part of the survey
design)   :)

On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese <pmenese at gmail.com> wrote:

Until a weeks ago I used stata for everything.
Now I'm learning R and trying to move. But, in this stage I'm testing R
trying to do the same things than I used to do in stata whit the same
outputs.
I have a problem with the logit, applying weights.

in stata I have this output
. svy: logit bach job2 mujer i.egp4 programa delay mdeo i.str evprivate
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =         1                  Number of obs      =
248
Number of PSUs     =       248                  Population size    =
5290.1639
Design df          =       247
F(  11,    237)    =      4.39
Prob > F           =    0.0000

Linearized
bach       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]

job2   -.4437446   .4385934    -1.01   0.313    -1.307605    .4201154
mujer    1.070595   .4169919     2.57   0.011     .2492812    1.891908

egp4
2    -.4839342    .539808    -0.90   0.371    -1.547148    .5792796
3    -1.288947   .5347344    -2.41   0.017    -2.342168   -.2357263
4    -.8569793   .5106425    -1.68   0.095    -1.862748    .1487898

programa    .9694352   .5677642     1.71   0.089    -.1488415    2.087712
delay   -1.552582   .5714967    -2.72   0.007    -2.678211    -.426954
mdeo   -.7938904   .3727571    -2.13   0.034    -1.528078   -.0597025

str
2    -1.122691   .5731879    -1.96   0.051     -2.25165    .0062682
3    -2.056682   .6350485    -3.24   0.001    -3.307483   -.8058812

evprivate   -1.962431   .5674143    -3.46   0.001    -3.080018   -.8448431
_cons    2.308699   .7274924     3.17   0.002     .8758187    3.741578

the best that i get in R was:

glm(formula = bach ~ job2 + mujer + egp4 + programa + delay +
    mdeo + str + evprivate, family = quasibinomial(link = "logit"),
    weights = wst7)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-12.5951   -3.9034   -0.9412    3.8268   11.2750

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)                  2.3087     0.7173   3.218  0.00147 **
job2                        -0.4437     0.4355  -1.019  0.30926
mujer                        1.0706     0.3558   3.009  0.00290 **
egp4intermediate (iii, iv)  -0.4839     0.4946  -0.978  0.32890
egp4skilled manual workers  -1.2889     0.5268  -2.447  0.01514 *
egp4working class           -0.8570     0.4625  -1.853  0.06514 .
programa                     0.9694     0.4951   1.958  0.05141 .
delay                       -1.5526     0.4878  -3.183  0.00166 **
mdeo                        -0.7939     0.4207  -1.887  0.06037 .
strest. ii                  -1.1227     0.4809  -2.334  0.02042 *
strestr. iii                -2.0567     0.5134  -4.006 8.28e-05 ***
evprivate                   -1.9624     0.6490  -3.024  0.00277 **
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

(Dispersion parameter for quasibinomial family taken to be 23.14436)

    Null deviance: 7318.5  on 246  degrees of freedom
Residual deviance: 5692.8  on 235  degrees of freedom
  (103 observations deleted due to missingness)
AIC: NA

Number of Fisher Scoring iterations: 6

Warning message:
In summary.glm(logit) :
  observations with zero weight not used for calculating dispersion

this has the same betas but the hypothesis test has differents values...

HELP!!!!

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121128/bca66c06/attachment.pl>
Le mercredi 28 novembre 2012 ? 14:20 -0300, Pablo Menese a ?crit :
Dear Milan... are you serious? 
Did you read this?
No, I had not read this message when I wrote the mail because you sent
two completely different messages in two different threads at about the
same time. As you can see, I was replying to the other message, which
did only mention glm().
I have this problem.

test <- svydesign(id=~1,weights=~peso)

logit <- svyglm(bach ~ job2 + mujer + egp4 + programa + delay + mdeo +
str + evprivate, family=binomial,design=test)

then appear:

Error in svyglm.survey.design(bach ~ job2 + mujer + egp4 + programa +
 : 
  all variables must be in design= argument

I don't know what this mean...
Please help.
Have you read ?svydesign? It has a "variables" argument that you can use
to specify the variables you need to include in the design object. The
documentation says:
variables: Formula or data frame specifying the variables measured in
          the survey. If ?NULL?, the ?data? argument is used.

So if you want to include all variables from your original data set,
pass it as the "data" argument, and that's all.

But first, stop using attach(), it creates confusion and is probably the
reason why you did not think of passing your data.frame object to
svydesign().

Regards
Quotes from a week ago...
I colud not perform anything using svyglm... I wish... but... I don't
know why...

On Tue, Nov 27, 2012 at 6:54 PM, Milan Bouchet-Valat
<nalimilan at club.fr> wrote:
        Le mardi 27 novembre 2012 ? 18:33 -0300, Pablo Menese a
        ?crit :
        > I can't ... I don't know why but I can't
        >
        > When I use it:
        >
        > logit <- glm(bach ~ egp4 + programa, weight=wst7,
        > family=quasibinomial(link"logit"))

        You were advised to use svyglm(), not glm(). It's usually
        considered
        polite to read carefully the anwsers you get to your
        questions...

        Regards

        > I reach the same betas that in STATA, but the hypothesis
        test, the t value,
        > and the std. error is different.
        >
        > I think that the solution can't be so far from this...
        >
        >
        > On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico
        <ajdamico at gmail.com> wrote:
        >
        > > from your stata output, it looks like you need to use the
        survey package
        > > in R
        > >
        > > for step-by-step instructions about how to do this (and
        comparisons to
        > > stata), see
        > >
        > >
        http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
        > >
        > > once you're ready to run the regression, use svyglm()
        instead of glm() and
        > > drop the weights argument (since it will already be part
        of the survey
        > > design)   :)
        > >
        > >
        > >
        > > On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese
        <pmenese at gmail.com> wrote:
        > >
        > >> Until a weeks ago I used stata for everything.
        > >> Now I'm learning R and trying to move. But, in this stage
        I'm testing R
        > >> trying to do the same things than I used to do in stata
        whit the same
        > >> outputs.
        > >> I have a problem with the logit, applying weights.
        > >>
        > >> in stata I have this output
        > >> . svy: logit bach job2 mujer i.egp4 programa delay mdeo
        i.str evprivate
        > >> (running logit on estimation sample)
        > >>
        > >> Survey: Logistic regression
        > >>
        > >> Number of strata   =         1                  Number of
        obs      =
        > >> 248
        > >> Number of PSUs     =       248
         Population size    =
        > >> 5290.1639
        > >> Design df          =       247
        > >> F(  11,    237)    =      4.39
        > >> Prob > F           =    0.0000
        > >>
        > >>
        > >> Linearized
        > >> bach       Coef.   Std. Err.      t    P>t     [95% Conf.
        Interval]
        > >>
        > >> job2   -.4437446   .4385934    -1.01   0.313
         -1.307605    .4201154
        > >> mujer    1.070595   .4169919     2.57   0.011
          .2492812    1.891908
        > >>
        > >> egp4
        > >> 2    -.4839342    .539808    -0.90   0.371    -1.547148
           .5792796
        > >> 3    -1.288947   .5347344    -2.41   0.017    -2.342168
        -.2357263
        > >> 4    -.8569793   .5106425    -1.68   0.095    -1.862748
           .1487898
        > >>
        > >> programa    .9694352   .5677642     1.71   0.089
         -.1488415    2.087712
        > >> delay   -1.552582   .5714967    -2.72   0.007
         -2.678211    -.426954
        > >> mdeo   -.7938904   .3727571    -2.13   0.034    -1.528078
        -.0597025
        > >>
        > >> str
        > >> 2    -1.122691   .5731879    -1.96   0.051     -2.25165
           .0062682
        > >> 3    -2.056682   .6350485    -3.24   0.001    -3.307483
        -.8058812
        > >>
        > >> evprivate   -1.962431   .5674143    -3.46   0.001
         -3.080018   -.8448431
        > >> _cons    2.308699   .7274924     3.17   0.002
          .8758187    3.741578
        > >>
        > >>
        > >> the best that i get in R was:
        > >>
        > >> glm(formula = bach ~ job2 + mujer + egp4 + programa +
        delay +
        > >>     mdeo + str + evprivate, family = quasibinomial(link =
        "logit"),
        > >>     weights = wst7)
        > >>
        > >> Deviance Residuals:
        > >>      Min        1Q    Median        3Q       Max
        > >> -12.5951   -3.9034   -0.9412    3.8268   11.2750
        > >>
        > >> Coefficients:
        > >>                            Estimate Std. Error t value
        Pr(>|t|)
        > >> (Intercept)                  2.3087     0.7173   3.218
         0.00147 **
        > >> job2                        -0.4437     0.4355  -1.019
         0.30926
        > >> mujer                        1.0706     0.3558   3.009
         0.00290 **
        > >> egp4intermediate (iii, iv)  -0.4839     0.4946  -0.978
         0.32890
        > >> egp4skilled manual workers  -1.2889     0.5268  -2.447
         0.01514 *
        > >> egp4working class           -0.8570     0.4625  -1.853
         0.06514 .
        > >> programa                     0.9694     0.4951   1.958
         0.05141 .
        > >> delay                       -1.5526     0.4878  -3.183
         0.00166 **
        > >> mdeo                        -0.7939     0.4207  -1.887
         0.06037 .
        > >> strest. ii                  -1.1227     0.4809  -2.334
         0.02042 *
        > >> strestr. iii                -2.0567     0.5134  -4.006
        8.28e-05 ***
        > >> evprivate                   -1.9624     0.6490  -3.024
         0.00277 **
        > >> ---
        > >> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
        > >>
        > >> (Dispersion parameter for quasibinomial family taken to
        be 23.14436)
        > >>
        > >>     Null deviance: 7318.5  on 246  degrees of freedom
        > >> Residual deviance: 5692.8  on 235  degrees of freedom
        > >>   (103 observations deleted due to missingness)
        > >> AIC: NA
        > >>
        > >> Number of Fisher Scoring iterations: 6
        > >>
        > >> Warning message:
        > >> In summary.glm(logit) :
        > >>   observations with zero weight not used for calculating
        dispersion
        > >>
        > >> this has the same betas but the hypothesis test has
        differents values...
        > >>
        > >>
        > >> HELP!!!!
        > >>
        > >>         [[alternative HTML version deleted]]
        > >>
        > >>
        > >> ______________________________________________
        > >> R-help at r-project.org mailing list
        > >> https://stat.ethz.ch/mailman/listinfo/r-help
        > >> PLEASE do read the posting guide
        > >> http://www.R-project.org/posting-guide.html
        > >> and provide commented, minimal, self-contained,
        reproducible code.
        > >>
        > >>
        > >
        >
        >       [[alternative HTML version deleted]]
        >
        > ______________________________________________
        > R-help at r-project.org mailing list
        > https://stat.ethz.ch/mailman/listinfo/r-help
        > PLEASE do read the posting guide
        http://www.R-project.org/posting-guide.html
        > and provide commented, minimal, self-contained, reproducible
        code.