Skip to content

logistic regression weights problem

3 messages · Brian Ripley, Federico Calboli

#
Hi All,

I have a problem with weighted logistic regression. I have a number of
SNPs  and a case/control scenario, but not all genotypes are as
"guaranteed" as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily "inferred".

My data is quite big, but with a dummy example:
Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)

Coefficients:
(Intercept)        SNPs1        SNPs2        SNPs3
     -2.079       42.282      -18.964           NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      3.867
Residual Deviance: 0.6279       AIC: 6.236
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
= Y, weights = weights, start = start, etastart = etastart,

NB I do not get warning (2) for my data so I'll completely disregard it.

Warning (1) looks suspiciously like a multiplication of my C/C status by
the weights... what exacly is glm doing with the weight vector? 

In any case, how would I go about weighting my individuals in a logistic
regression?

Regards,

Federico Calboli
#
On Wed, 13 Apr 2005, Federico Calboli wrote:

            
Using it in the GLM definition.  If you specify 0<=y_i<=1 and weights a_i, 
this is how you specify Binomial(a_i, a_iy_i).  Look up any book on GLMs 
and see what it says about the binomial.  E.g. MASS4 pp. 184, 190.
Use the cbind(yes, no) form of specification.  Note though that the 
`weights' in a GLM are case weights and not arbitrary downweighting 
factors and aspects of the output (e.g. AIC, anova) depend on this.  A 
different implementation of (differently) weighted GLM is svyglm() in 
package 'survey'.
#
On Wed, 2005-04-13 at 17:42 +0100, Prof Brian Ripley wrote:
I tried to use cbind() on a slightly modified dummy set to get get rid
of all warnings and that's what I got:


status <- c(1,1,1,0,0)
SNPs <- matrix( c(1,0,1,0,0,1,0,1,0,1,0,1,0,1,1), ncol =3)
weight <- c(0.2, 0.1, 1, 0.8, 0.7)

############ using cbind()

glm(cbind(status, 1-status) ~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = cbind(status, 1 - status) ~ SNPs, family =
binomial,      weights = weight)

Coefficients:
(Intercept)        SNPs1        SNPs2        SNPs3
     -2.079       43.132      -19.487           NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      3.867
Residual Deviance: 0.6279       AIC: 6.236

########### NOT using cbind()

glm(status~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)

Coefficients:
(Intercept)        SNPs1        SNPs2        SNPs3
     -2.079       42.944      -19.366           NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      3.867
Residual Deviance: 0.6279       AIC: 6.236
Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)
##############################################

The anova() call of the cbind() model seems happy:

###############################################
mod <- glm(cbind(status, 1-status) ~ SNPs, weights = weight, family =
binomial)

anova(mod, test = "Chi")
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(status, 1 - status)

Terms added sequentially (first to last)


     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                     4     3.8673
SNPs  2   3.2394         2     0.6279    0.1980
#####################################################

The real data modeldoes not show warnings when I use cbind() but it
still shows warning when I call anova() on the model:

#######################################################
anova(glm(cbind(sta, 1-sta) ~ (X1 + X2 + X3 + X4 + X5) * breed, family=
binomial, weights =igf$we, igf),test="Chi")
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(sta, 1 - sta)

Terms added sequentially (first to last)


          Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                        330     372.89
X1         1     0.14       329     372.75      0.71
X2         1     0.26       328     372.49      0.61
X3         1 0.001121       327     372.49      0.97
X4         1     2.63       326     369.86      0.10
X5         1     1.87       325     367.99      0.17
breed      3     5.41       322     362.58      0.14
X1:breed   3     2.98       319     359.60      0.39
X2:breed   3     1.21       316     358.39      0.75
X3:breed   2     1.32       314     357.08      0.52
X4:breed   3     1.75       311     355.33      0.63
X5:breed   2     2.38       309     352.95      0.30
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
3: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
4: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
5: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
6: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
7: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
8: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
9: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
10: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
##################################################

Why such inconsistency?

Regards,

Federico Calboli