How to intepret a factor response model?
On 04-May-05 Maciej Blizi??ski wrote:
Hello, I'd like to create a model with a factor-type response variable. This is an example:
mydata <- data.frame(factor_var = as.factor(c(rep('one', 100),
rep('two', 100), rep('three', 100))), real_var = c(rnorm(150),
rnorm(150) + 5))
summary(mydata)
factor_var real_var
one :100 Min. :-2.742877
three:100 1st Qu.:-0.009493
two :100 Median : 2.361669
Mean : 2.490411
3rd Qu.: 4.822394
Max. : 6.924588
mymodel = glm(factor_var ~ real_var, family = 'binomial', data = mydata) summary(mymodel)
Call:
glm(formula = factor_var ~ real_var, family = "binomial", data =
mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7442 -0.6774 0.1849 0.3133 2.1187
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.6798 0.1882 -3.613 0.000303 ***
real_var 0.8971 0.1066 8.417 < 2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 381.91 on 299 degrees of freedom
Residual deviance: 213.31 on 298 degrees of freedom
AIC: 217.31
Number of Fisher Scoring iterations: 6
Have you noticed that you get identical results with
set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
rep('two',100), rep('three', 100))),
real.var = c(rnorm(150), rnorm(150) + 5))
mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)
and
set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
rep('two',200))),real.var = c(rnorm(150),rnorm(150) + 5))
mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)
(I've left out the "summary(mydata)" since these do naturally
differ, and I've replaced "factor_var" with "factor.var" and
"real_var" with "real.var" because of potential complications
with "_"; also "mymodel =" to "mymodel <-").
So I think the interpretation of the results from your first
model is that, because of the "family='binomial'", glm is
treating "factor.var='one'" as binomial response "0", say,
and "factor.var='two'" or "factor.var='three'" as binomial
response "1".
You're trying to fit a multinomial response, but you've
specified a binomial family to 'glm'. 'glm' does not have
a multinomial response family.
You could try 'multinom' from package 'nnet' which fits
loglinear models to factor responses with more than 2 levels.
E.g.
library(nnet)
mymodel <- multinom(factor.var ~ real.var,data=mydata)
### weights: 9 (4 variable)
## initial value 329.583687
## iter 10 value 209.780666
## final value 209.779951
## converged
summary(mymodel)
## Re-fitting to get Hessian
## Call:
## multinom(formula = factor.var ~ real.var, data = mydata)
## Coefficients:
## (Intercept) real.var
## three -3.4262565 1.3838231
## two -0.6754253 0.7116955
##
## Std. Errors:
## (Intercept) real.var
## three 0.5028541 0.1480138
## two 0.1846827 0.1068821
##
## Residual Deviance: 419.5599
## AIC: 427.5599
##
## Correlation of Coefficients:
## three:(Intercept) three:real.var two:(Intercept)
## three:real.var -0.7286258
## two:(Intercept) 0.1986995 -0.1261034
## two:real.var -0.1411377 0.7012481 -0.3285741
This output does suggest a fairly clear interpretation!
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-May-05 Time: 09:18:03
------------------------------ XFMail ------------------------------