logistic regression weights problem - R-help

Wed, Apr 13, 2005 8:11 AM #

Hi All,

I have a problem with weighted logistic regression. I have a number of
SNPs  and a case/control scenario, but not all genotypes are as
"guaranteed" as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily "inferred".

My data is quite big, but with a dummy example:

Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)

Coefficients:
(Intercept)        SNPs1        SNPs2        SNPs3
     -2.079       42.282      -18.964           NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      3.867
Residual Deviance: 0.6279       AIC: 6.236
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
= Y, weights = weights, start = start, etastart = etastart,

NB I do not get warning (2) for my data so I'll completely disregard it.

Warning (1) looks suspiciously like a multiplication of my C/C status by
the weights... what exacly is glm doing with the weight vector? 

In any case, how would I go about weighting my individuals in a logistic
regression?

Regards,

Federico Calboli

Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

Brian Ripley

Wed, Apr 13, 2005 9:42 AM #

On Wed, 13 Apr 2005, Federico Calboli wrote:

Using it in the GLM definition.  If you specify 0<=y_i<=1 and weights a_i, 
this is how you specify Binomial(a_i, a_iy_i).  Look up any book on GLMs 
and see what it says about the binomial.  E.g. MASS4 pp. 184, 190.

Use the cbind(yes, no) form of specification.  Note though that the 
`weights' in a GLM are case weights and not arbitrary downweighting 
factors and aspects of the output (e.g. AIC, anova) depend on this.  A 
different implementation of (differently) weighted GLM is svyglm() in 
package 'survey'.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Federico Calboli

Wed, Apr 13, 2005 1:17 PM #

On Wed, 2005-04-13 at 17:42 +0100, Prof Brian Ripley wrote:

I tried to use cbind() on a slightly modified dummy set to get get rid
of all warnings and that's what I got:


status <- c(1,1,1,0,0)
SNPs <- matrix( c(1,0,1,0,0,1,0,1,0,1,0,1,0,1,1), ncol =3)
weight <- c(0.2, 0.1, 1, 0.8, 0.7)

############ using cbind()

glm(cbind(status, 1-status) ~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = cbind(status, 1 - status) ~ SNPs, family =
binomial,      weights = weight)

Coefficients:
(Intercept)        SNPs1        SNPs2        SNPs3
     -2.079       43.132      -19.487           NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      3.867
Residual Deviance: 0.6279       AIC: 6.236

########### NOT using cbind()

glm(status~ SNPs, weights = weight, family = binomial)

Call:  glm(formula = status ~ SNPs, family = binomial, weights = weight)

Coefficients:
(Intercept)        SNPs1        SNPs2        SNPs3
     -2.079       42.944      -19.366           NA

Degrees of Freedom: 4 Total (i.e. Null);  2 Residual
Null Deviance:      3.867
Residual Deviance: 0.6279       AIC: 6.236
Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)
##############################################

The anova() call of the cbind() model seems happy:

###############################################
mod <- glm(cbind(status, 1-status) ~ SNPs, weights = weight, family =
binomial)

anova(mod, test = "Chi")
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(status, 1 - status)

Terms added sequentially (first to last)


     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                     4     3.8673
SNPs  2   3.2394         2     0.6279    0.1980
#####################################################

The real data modeldoes not show warnings when I use cbind() but it
still shows warning when I call anova() on the model:

#######################################################
anova(glm(cbind(sta, 1-sta) ~ (X1 + X2 + X3 + X4 + X5) * breed, family=
binomial, weights =igf$we, igf),test="Chi")
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(sta, 1 - sta)

Terms added sequentially (first to last)


          Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                        330     372.89
X1         1     0.14       329     372.75      0.71
X2         1     0.26       328     372.49      0.61
X3         1 0.001121       327     372.49      0.97
X4         1     2.63       326     369.86      0.10
X5         1     1.87       325     367.99      0.17
breed      3     5.41       322     362.58      0.14
X1:breed   3     2.98       319     359.60      0.39
X2:breed   3     1.21       316     358.39      0.75
X3:breed   2     1.32       314     357.08      0.52
X4:breed   3     1.75       311     355.33      0.63
X5:breed   2     2.38       309     352.95      0.30
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
3: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
4: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
5: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
6: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
7: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
8: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
9: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
10: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
##################################################

Why such inconsistency?

Regards,

Federico Calboli

Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com