Help with "non-integer #successes in a binomial glm"
On Mon, 8 Aug 2005, Haibo Huang wrote:
I had a logit regression, but don't really know how to handle the "Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)" problem. I had the same logit regression without weights and it worked out without the warning, but I figured it makes more sense to add the weights. The weights sum up to one.
Weights are case weights in a binomial GLM, that is w_i means `I have w_i of these'. Do check out the theory in MASS (the book) or Nelder & McCullagh. There are some circumstances when fractional weights make sense (when this doing something other than fitting a glm, e.g. part of a `mixture of experts' model) but they are unusual, hence the warning.
Could anyone give me some hint? Thanks a lot! FYI, I have posted both regressions (with and without weights) below. Ed
setwd("P:/Work in Progress/Haibo/Hans")
Lease=read.csv("lease.csv", header=TRUE)
Lease$ET <- factor(Lease$EarlyTermination)
SICCode=factor(Lease$SIC.Code)
Lease$TO=factor(Lease$TenantHasOption)
Lease$LO=factor(Lease$LandlordHasOption)
Lease$TEO=factor(Lease$TenantExercisedOption)
RegA=glm(ET~1+TO,
+ family=binomial(link=logit), data=Lease)
summary(RegA)
Call:
glm(formula = ET ~ 1 + TO, family = binomial(link =
logit), data = Lease)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.5839 -0.5839 -0.5839 -0.3585 2.3565
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.68271 0.02363 -71.20 <2e-16 ***
TO1 -1.02959 0.09012 -11.43 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1
(Dispersion parameter for binomial family taken to be
1)
Null deviance: 12987 on 15809 degrees of freedom
Residual deviance: 12819 on 15808 degrees of freedom
AIC: 12823
Number of Fisher Scoring iterations: 5
setwd("P:/Work in Progress/Haibo/Hans")
Lease=read.csv("lease.csv", header=TRUE)
Lease$ET <- factor(Lease$EarlyTermination)
SICCode=factor(Lease$SIC.Code)
Lease$TO=factor(Lease$TenantHasOption)
Lease$LO=factor(Lease$LandlordHasOption)
Lease$TEO=factor(Lease$TenantExercisedOption)
RegA=glm(ET~1+TO,
+ family=binomial(link=logit), data=Lease, weights=PortionSF) Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)
summary(RegA)
Call:
glm(formula = ET ~ 1 + TO, family = binomial(link =
logit), data = Lease,
weights = PortionSF)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.055002 -0.003434 0.000000 0.000000 0.120656
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.120 2.618 -0.428 0.669
TO1 -1.570 9.251 -0.170 0.865
(Dispersion parameter for binomial family taken to be
1)
Null deviance: 1.0201 on 9302 degrees of freedom
Residual deviance: 0.9787 on 9301 degrees of freedom
AIC: 4
Number of Fisher Scoring iterations: 5
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595