I reach the same betas that in STATA, but the hypothesis test, the t value,
and the std. error is different.
I think that the solution can't be so far from this...
On Fri, Nov 23, 2012 at 9:49 PM, Anthony Damico <ajdamico at gmail.com> wrote:
from your stata output, it looks like you need to use the survey package
in R
for step-by-step instructions about how to do this (and comparisons to
stata), see
http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
once you're ready to run the regression, use svyglm() instead of glm() and
drop the weights argument (since it will already be part of the survey
design) :)
On Fri, Nov 23, 2012 at 3:13 PM, Pablo Menese <pmenese at gmail.com> wrote:
Until a weeks ago I used stata for everything.
Now I'm learning R and trying to move. But, in this stage I'm testing R
trying to do the same things than I used to do in stata whit the same
outputs.
I have a problem with the logit, applying weights.
in stata I have this output
. svy: logit bach job2 mujer i.egp4 programa delay mdeo i.str evprivate
(running logit on estimation sample)
Survey: Logistic regression
Number of strata = 1 Number of obs =
248
Number of PSUs = 248 Population size =
5290.1639
Design df = 247
F( 11, 237) = 4.39
Prob > F = 0.0000
Linearized
bach Coef. Std. Err. t P>t [95% Conf. Interval]
job2 -.4437446 .4385934 -1.01 0.313 -1.307605 .4201154
mujer 1.070595 .4169919 2.57 0.011 .2492812 1.891908
egp4
2 -.4839342 .539808 -0.90 0.371 -1.547148 .5792796
3 -1.288947 .5347344 -2.41 0.017 -2.342168 -.2357263
4 -.8569793 .5106425 -1.68 0.095 -1.862748 .1487898
programa .9694352 .5677642 1.71 0.089 -.1488415 2.087712
delay -1.552582 .5714967 -2.72 0.007 -2.678211 -.426954
mdeo -.7938904 .3727571 -2.13 0.034 -1.528078 -.0597025
str
2 -1.122691 .5731879 -1.96 0.051 -2.25165 .0062682
3 -2.056682 .6350485 -3.24 0.001 -3.307483 -.8058812
evprivate -1.962431 .5674143 -3.46 0.001 -3.080018 -.8448431
_cons 2.308699 .7274924 3.17 0.002 .8758187 3.741578
the best that i get in R was:
glm(formula = bach ~ job2 + mujer + egp4 + programa + delay +
mdeo + str + evprivate, family = quasibinomial(link = "logit"),
weights = wst7)
Deviance Residuals:
Min 1Q Median 3Q Max
-12.5951 -3.9034 -0.9412 3.8268 11.2750
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3087 0.7173 3.218 0.00147 **
job2 -0.4437 0.4355 -1.019 0.30926
mujer 1.0706 0.3558 3.009 0.00290 **
egp4intermediate (iii, iv) -0.4839 0.4946 -0.978 0.32890
egp4skilled manual workers -1.2889 0.5268 -2.447 0.01514 *
egp4working class -0.8570 0.4625 -1.853 0.06514 .
programa 0.9694 0.4951 1.958 0.05141 .
delay -1.5526 0.4878 -3.183 0.00166 **
mdeo -0.7939 0.4207 -1.887 0.06037 .
strest. ii -1.1227 0.4809 -2.334 0.02042 *
strestr. iii -2.0567 0.5134 -4.006 8.28e-05 ***
evprivate -1.9624 0.6490 -3.024 0.00277 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for quasibinomial family taken to be 23.14436)
Null deviance: 7318.5 on 246 degrees of freedom
Residual deviance: 5692.8 on 235 degrees of freedom
(103 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 6
Warning message:
In summary.glm(logit) :
observations with zero weight not used for calculating dispersion
this has the same betas but the hypothesis test has differents values...
HELP!!!!
[[alternative HTML version deleted]]