Odds Ratio and Logistic Regression
At 18:14 30/12/2012, Lorenzo Isella wrote:
Dear All, I am learning the ropes about logistic regression in R. I found some interesting examples http://bit.ly/Vq4GgX http://bit.ly/W9fUTg http://bit.ly/UfK73e but I am a bit lost. I have several questions. 1) For instance, what is the difference between glm.out = glm(response ~ poverty + gender, family=binomial(logit), data=mydata) and glm.out = glm(response ~ poverty * gender, family=binomial(logit), data=mydata) ? Which begs the question when I should use the "*" or "+" sign when doing a logistic regression on several explanatory variables. I think that in the former case I am allowing for an interaction between poverty and gender, but I would like to be sure about it.
I think you need to (re)-read any introductory text on R, in particular about the use of formulae. The asterisk implies an interaction. This also answers your second question I think.
2) Consider the following snippet
glm.out = glm(response ~ poverty + gender, family=binomial(logit),
data=mydata)
where "response" is a dichotomous variable, poverty assumes only two
values (Above poverty line and Below poverty line) and gender assumes only
the Male or Female values.
The command above leads to the following output
#######################################
print(summary(glm.out))
Call:
glm(formula = response ~ poverty + gender, family = binomial(logit),
data = mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2094 0.4269 0.4269 0.8033 1.1911
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9656 0.1477 6.538 6.25e-11 ***
povertyBelow poverty line -0.9978 0.3246 -3.074 0.00211 **
genderFEMALE 1.3840 0.2549 5.429 5.68e-08 ***
---
Signif. codes: 0 ???***??? 0.001 ???**??? 0.01
???*??? 0.05 ???.??? 0.1 ??? ??? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 494.81 on 499 degrees of freedom
Residual deviance: 457.13 on 497 degrees of freedom
AIC: 463.13
Number of Fisher Scoring iterations: 4
##############################################
To calculate then odds ratios, I should do the following
exp(coef(glm.out))
(Intercept) povertyBelow poverty line
genderFEMALE
2.6263831 0.3687033
3.9909627
but here I am lost about the interpretation. For instance, what are the
odds of a positive response for those above versus below the poverty line
in males? In females?
I think that everything is there, but I cannot extract/interpret the info
provided by R correctly.
Any help is appreciated.
Cheers
Lorenzo
Michael Dewey info at aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html