logistic regression weights problem
Hi All, I have a problem with weighted logistic regression. I have a number of SNPs and a case/control scenario, but not all genotypes are as "guaranteed" as others, so I am using weights to downsample the importance of individuals whose genotype has been heavily "inferred". My data is quite big, but with a dummy example:
status <- c(1,1,1,0,0) SNPs <- matrix( c(1,0,1,0,0,0,0,1,0,1,0,1,0,1,1), ncol =3) weight <- c(0.2, 0.1, 1, 0.8, 0.7) glm(status ~ SNPs, weights = weight, family = binomial)
Call: glm(formula = status ~ SNPs, family = binomial, weights = weight)
Coefficients:
(Intercept) SNPs1 SNPs2 SNPs3
-2.079 42.282 -18.964 NA
Degrees of Freedom: 4 Total (i.e. Null); 2 Residual
Null Deviance: 3.867
Residual Deviance: 0.6279 AIC: 6.236
Warning messages:
1: non-integer #successes in a binomial glm! in: eval(expr, envir,
enclos)
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y
= Y, weights = weights, start = start, etastart = etastart,
NB I do not get warning (2) for my data so I'll completely disregard it.
Warning (1) looks suspiciously like a multiplication of my C/C status by
the weights... what exacly is glm doing with the weight vector?
In any case, how would I go about weighting my individuals in a logistic
regression?
Regards,
Federico Calboli
Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com