An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121123/76de3041/attachment.pl>
Problems with weight
6 messages · Anthony Damico, Pablo Menese, Milan Bouchet-Valat
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121123/ea770d46/attachment.pl>
3 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121127/06be26e3/attachment.pl>
Le mardi 27 novembre 2012 ? 18:33 -0300, Pablo Menese a ?crit :
I can't ... I don't know why but I can't When I use it: logit <- glm(bach ~ egp4 + programa, weight=wst7, family=quasibinomial(link"logit"))
You were advised to use svyglm(), not glm(). It's usually considered polite to read carefully the anwsers you get to your questions... Regards
I reach the same betas that in STATA, but the hypothesis test, the t value, and the std. error is different. I think that the solution can't be so far from this... On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico <ajdamico at gmail.com> wrote:
from your stata output, it looks like you need to use the survey package in R for step-by-step instructions about how to do this (and comparisons to stata), see http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf once you're ready to run the regression, use svyglm() instead of glm() and drop the weights argument (since it will already be part of the survey design) :) On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese <pmenese at gmail.com> wrote:
Until a weeks ago I used stata for everything.
Now I'm learning R and trying to move. But, in this stage I'm testing R
trying to do the same things than I used to do in stata whit the same
outputs.
I have a problem with the logit, applying weights.
in stata I have this output
. svy: logit bach job2 mujer i.egp4 programa delay mdeo i.str evprivate
(running logit on estimation sample)
Survey: Logistic regression
Number of strata = 1 Number of obs =
248
Number of PSUs = 248 Population size =
5290.1639
Design df = 247
F( 11, 237) = 4.39
Prob > F = 0.0000
Linearized
bach Coef. Std. Err. t P>t [95% Conf. Interval]
job2 -.4437446 .4385934 -1.01 0.313 -1.307605 .4201154
mujer 1.070595 .4169919 2.57 0.011 .2492812 1.891908
egp4
2 -.4839342 .539808 -0.90 0.371 -1.547148 .5792796
3 -1.288947 .5347344 -2.41 0.017 -2.342168 -.2357263
4 -.8569793 .5106425 -1.68 0.095 -1.862748 .1487898
programa .9694352 .5677642 1.71 0.089 -.1488415 2.087712
delay -1.552582 .5714967 -2.72 0.007 -2.678211 -.426954
mdeo -.7938904 .3727571 -2.13 0.034 -1.528078 -.0597025
str
2 -1.122691 .5731879 -1.96 0.051 -2.25165 .0062682
3 -2.056682 .6350485 -3.24 0.001 -3.307483 -.8058812
evprivate -1.962431 .5674143 -3.46 0.001 -3.080018 -.8448431
_cons 2.308699 .7274924 3.17 0.002 .8758187 3.741578
the best that i get in R was:
glm(formula = bach ~ job2 + mujer + egp4 + programa + delay +
mdeo + str + evprivate, family = quasibinomial(link = "logit"),
weights = wst7)
Deviance Residuals:
Min 1Q Median 3Q Max
-12.5951 -3.9034 -0.9412 3.8268 11.2750
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3087 0.7173 3.218 0.00147 **
job2 -0.4437 0.4355 -1.019 0.30926
mujer 1.0706 0.3558 3.009 0.00290 **
egp4intermediate (iii, iv) -0.4839 0.4946 -0.978 0.32890
egp4skilled manual workers -1.2889 0.5268 -2.447 0.01514 *
egp4working class -0.8570 0.4625 -1.853 0.06514 .
programa 0.9694 0.4951 1.958 0.05141 .
delay -1.5526 0.4878 -3.183 0.00166 **
mdeo -0.7939 0.4207 -1.887 0.06037 .
strest. ii -1.1227 0.4809 -2.334 0.02042 *
strestr. iii -2.0567 0.5134 -4.006 8.28e-05 ***
evprivate -1.9624 0.6490 -3.024 0.00277 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for quasibinomial family taken to be 23.14436)
Null deviance: 7318.5 on 246 degrees of freedom
Residual deviance: 5692.8 on 235 degrees of freedom
(103 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 6
Warning message:
In summary.glm(logit) :
observations with zero weight not used for calculating dispersion
this has the same betas but the hypothesis test has differents values...
HELP!!!!
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121128/bca66c06/attachment.pl>
Le mercredi 28 novembre 2012 ? 14:20 -0300, Pablo Menese a ?crit :
Dear Milan... are you serious? Did you read this?
No, I had not read this message when I wrote the mail because you sent two completely different messages in two different threads at about the same time. As you can see, I was replying to the other message, which did only mention glm().
I have this problem. test <- svydesign(id=~1,weights=~peso) logit <- svyglm(bach ~ job2 + mujer + egp4 + programa + delay + mdeo + str + evprivate, family=binomial,design=test) then appear: Error in svyglm.survey.design(bach ~ job2 + mujer + egp4 + programa + : all variables must be in design= argument I don't know what this mean... Please help.
Have you read ?svydesign? It has a "variables" argument that you can use
to specify the variables you need to include in the design object. The
documentation says:
variables: Formula or data frame specifying the variables measured in
the survey. If ?NULL?, the ?data? argument is used.
So if you want to include all variables from your original data set,
pass it as the "data" argument, and that's all.
But first, stop using attach(), it creates confusion and is probably the
reason why you did not think of passing your data.frame object to
svydesign().
Regards
Quotes from a week ago...
I colud not perform anything using svyglm... I wish... but... I don't
know why...
On Tue, Nov 27, 2012 at 6:54 PM, Milan Bouchet-Valat
<nalimilan at club.fr> wrote:
Le mardi 27 novembre 2012 ? 18:33 -0300, Pablo Menese a
?crit :
> I can't ... I don't know why but I can't
>
> When I use it:
>
> logit <- glm(bach ~ egp4 + programa, weight=wst7,
> family=quasibinomial(link"logit"))
You were advised to use svyglm(), not glm(). It's usually
considered
polite to read carefully the anwsers you get to your
questions...
Regards
> I reach the same betas that in STATA, but the hypothesis
test, the t value,
> and the std. error is different.
>
> I think that the solution can't be so far from this...
>
>
> On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico
<ajdamico at gmail.com> wrote:
>
> > from your stata output, it looks like you need to use the
survey package
> > in R
> >
> > for step-by-step instructions about how to do this (and
comparisons to
> > stata), see
> >
> >
> >
> > once you're ready to run the regression, use svyglm()
instead of glm() and
> > drop the weights argument (since it will already be part
of the survey
> > design) :)
> >
> >
> >
> > On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese
<pmenese at gmail.com> wrote:
> >
> >> Until a weeks ago I used stata for everything.
> >> Now I'm learning R and trying to move. But, in this stage
I'm testing R
> >> trying to do the same things than I used to do in stata
whit the same
> >> outputs.
> >> I have a problem with the logit, applying weights.
> >>
> >> in stata I have this output
> >> . svy: logit bach job2 mujer i.egp4 programa delay mdeo
i.str evprivate
> >> (running logit on estimation sample)
> >>
> >> Survey: Logistic regression
> >>
> >> Number of strata = 1 Number of
obs =
> >> 248
> >> Number of PSUs = 248
Population size =
> >> 5290.1639
> >> Design df = 247
> >> F( 11, 237) = 4.39
> >> Prob > F = 0.0000
> >>
> >>
> >> Linearized
> >> bach Coef. Std. Err. t P>t [95% Conf.
Interval]
> >>
> >> job2 -.4437446 .4385934 -1.01 0.313
-1.307605 .4201154
> >> mujer 1.070595 .4169919 2.57 0.011
.2492812 1.891908
> >>
> >> egp4
> >> 2 -.4839342 .539808 -0.90 0.371 -1.547148
.5792796
> >> 3 -1.288947 .5347344 -2.41 0.017 -2.342168
-.2357263
> >> 4 -.8569793 .5106425 -1.68 0.095 -1.862748
.1487898
> >>
> >> programa .9694352 .5677642 1.71 0.089
-.1488415 2.087712
> >> delay -1.552582 .5714967 -2.72 0.007
-2.678211 -.426954
> >> mdeo -.7938904 .3727571 -2.13 0.034 -1.528078
-.0597025
> >>
> >> str
> >> 2 -1.122691 .5731879 -1.96 0.051 -2.25165
.0062682
> >> 3 -2.056682 .6350485 -3.24 0.001 -3.307483
-.8058812
> >>
> >> evprivate -1.962431 .5674143 -3.46 0.001
-3.080018 -.8448431
> >> _cons 2.308699 .7274924 3.17 0.002
.8758187 3.741578
> >>
> >>
> >> the best that i get in R was:
> >>
> >> glm(formula = bach ~ job2 + mujer + egp4 + programa +
delay +
> >> mdeo + str + evprivate, family = quasibinomial(link =
"logit"),
> >> weights = wst7)
> >>
> >> Deviance Residuals:
> >> Min 1Q Median 3Q Max
> >> -12.5951 -3.9034 -0.9412 3.8268 11.2750
> >>
> >> Coefficients:
> >> Estimate Std. Error t value
Pr(>|t|)
> >> (Intercept) 2.3087 0.7173 3.218
0.00147 **
> >> job2 -0.4437 0.4355 -1.019
0.30926
> >> mujer 1.0706 0.3558 3.009
0.00290 **
> >> egp4intermediate (iii, iv) -0.4839 0.4946 -0.978
0.32890
> >> egp4skilled manual workers -1.2889 0.5268 -2.447
0.01514 *
> >> egp4working class -0.8570 0.4625 -1.853
0.06514 .
> >> programa 0.9694 0.4951 1.958
0.05141 .
> >> delay -1.5526 0.4878 -3.183
0.00166 **
> >> mdeo -0.7939 0.4207 -1.887
0.06037 .
> >> strest. ii -1.1227 0.4809 -2.334
0.02042 *
> >> strestr. iii -2.0567 0.5134 -4.006
8.28e-05 ***
> >> evprivate -1.9624 0.6490 -3.024
0.00277 **
> >> ---
> >> Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> >>
> >> (Dispersion parameter for quasibinomial family taken to
be 23.14436)
> >>
> >> Null deviance: 7318.5 on 246 degrees of freedom
> >> Residual deviance: 5692.8 on 235 degrees of freedom
> >> (103 observations deleted due to missingness)
> >> AIC: NA
> >>
> >> Number of Fisher Scoring iterations: 6
> >>
> >> Warning message:
> >> In summary.glm(logit) :
> >> observations with zero weight not used for calculating
dispersion
> >>
> >> this has the same betas but the hypothesis test has
differents values...
> >>
> >>
> >> HELP!!!!
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained,
reproducible code.
> >>
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.