Prev 306623 / 398503 Next

Unexpected behavior with weights in binomial glm()

Bert Gunter

Sun, Sep 30, 2012 7:11 AM

I haven't followed this thread closely, but if perfect separation in a
binomial glm is the problem, google it. e.g.

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/complete_separation_logit_models.htm

This presumably explains your concerns about coefficient agreement.

-- Bert


On Sun, Sep 30, 2012 at 4:47 AM, Josh Browning

<rockclimber112358 at gmail.com> wrote:

Hi David,

Yes, I agree that the results are "very similar" but I don't
understand why they are not exactly equal given that the data sets are
identical.

And yes, this 1% numerical difference is hugely important to me.  I
have another data set (much larger than this toy example) that works
on the aggregated data (returning a coefficient of about 1) but
returns the warning about perfect separation on the non-aggregated
data (and a coefficient of about 1e15).  So, I'd at least like to be
able to understand where this numerical difference is coming from and,
preferably, a way to tweak my glm() runs (possibly adjusting the
numerical precision somehow???) so that this doesn't happen.

Josh

On Sat, Sep 29, 2012 at 7:50 PM, David Winsemius <dwinsemius at comcast.net> wrote:

On Sep 29, 2012, at 7:10 AM, Josh Browning wrote:

Hi useRs,

I'm experiencing something quite weird with glm() and weights, and
maybe someone can explain what I'm doing wrong.  I have a dataset
where each row represents a single case, and I run
glm(...,family="binomial") and get my coefficients.  However, some of
my cases have the exact same values for predictor variables, so I
should be able to aggregate up my data frame and run glm(...,
family="binomial",weights=wts) and get the same coefficients (maybe
this is my incorrect assumption, but I can't see why it would be).
Anyways, here's a minimum working example below:

d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) )
glm( RESP ~ INDEP, family="binomial", data=d )

Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = d)

Coefficients:
(Intercept)        INDEP
    -1.609       21.176

Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
Null Deviance:      13.86
Residual Deviance: 5.407        AIC: 9.407

dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length )
colnames(dAgg) = c("RESP","INDEP","WT")
glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT )

Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg,
   weights = WT)

Coefficients:
(Intercept)        INDEP
    -1.609       20.975

Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
Null Deviance:      13.86
Residual Deviance: 5.407        AIC: 9.407

Those two results look very similar and it is with a data situation that seems somewhat extreme. The concern is for the 1% numerical  difference in the regression coefficient? Am I reading you correctly?

--
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Thread (7 messages)

Josh Browning Unexpected behavior with weights in binomial glm() Sep 29 David Winsemius Unexpected behavior with weights in binomial glm() Sep 29 Josh Browning Unexpected behavior with weights in binomial glm() Sep 30 Bert Gunter Unexpected behavior with weights in binomial glm() Sep 30 Ben Bolker Unexpected behavior with weights in binomial glm() Sep 30 Joshua Wiley Unexpected behavior with weights in binomial glm() Sep 30 David Winsemius Unexpected behavior with weights in binomial glm() Sep 30