Skip to content

binomial GLM quasi separation

7 messages · Simone Santoro, Ben Bolker, Uwe Ligges +1 more

#
Hi all,

I have run a (glm) analysis where the dependent variable is the gender
(family=binomial) and the predictors are percentages.
I get a warning saying "fitted probabilities numerically 0 or 1 occurred"
that is indicating that quasi-separation or separation is occurring.
This makes sense given that one of these predictors have a very influential
effect that is depending on a specific threshold separating these effects,
in other words in my analysis one of these variables predicts males about
the 80% of times when its values are less or equal to zero and females about
the 80% when its values are greater than zero.
I have been looking at other posts about this but I haven?t understood how I
should act when the separation (or quasi separation) is not a statistical
artifact but it is something real.
As suggested in 
http://r.789695.n4.nabble.com/OT-quasi-separation-in-a-logistic-GLM-td875726.html#a3850331
http://r.789695.n4.nabble.com/OT-quasi-separation-in-a-logistic-GLM-td875726.html#a3850331 
(the last post is mine) I tried to use brglm procedure that uses a penalized
maximum likelihood but it made no difference. 

What would you do if you were in my shoes?
Thanks in advance for any help.

Simone


--
View this message in context: http://r.789695.n4.nabble.com/binomial-GLM-quasi-separation-tp3901687p3901687.html
Sent from the R help mailing list archive at Nabble.com.
#
lincoln <miseno77 <at> hotmail.com> writes:
[warning, broke URLs to make gmane happy]
I'm not sure what's going on here, and I don't know why brglm()
shouldn't work ... from a squint at your Nabble post (I can't
really see the figure very well), I agree that
the hcp profile is funky, but I wouldn't immediately conclude that
the profile is bad -- in particular, it seems that the x-axis range
is -45 to -15, rather than something like (-600,-300) as I would expect
from the estimated parameter (ca. -400) and standard error (ca. 60).
I would start by setting which=3 (to confine your attention to the
hcp parameter) and messing around with the gridsize, stepsize, stdn
parameters in profileModel to see what's going on.

 If that doesn't work you might have to post data, or a subset of
data, in order to get any more help ...
#
On 13.10.2011 21:46, Ben Bolker wrote:
Or if just the separating hyperplane is to be found (and no tests have 
to be considered), I'd use an lda rather than logistic regression in 
such a case.

Uwe Ligges
#
As you suggested I had a further look at the profile by changing default
values of stepsize (I tried to modify the others but apparently there was
any change).
Here they go the scripts I have used:
Mensajes de aviso perdidos
glm.fit: fitted probabilities numerically 0 or 1 occurred
Preliminary iteration . Done

Profiling for parameter hcp ... Done
Preliminary iteration . Done

Profiling for parameter hcp ... Done
Preliminary iteration . Done

Profiling for parameter hcp ... Done
And these are the plots as they look like:
http://r.789695.n4.nabble.com/file/n3904261/plot1.png 
http://r.789695.n4.nabble.com/file/n3904261/plot2.png 
http://r.789695.n4.nabble.com/file/n3904261/plot3.png 

I have tried to understand what is going on but I don't know how to
interpret this.
It's quite a long time that I am trying to solve this but I have not been
able to. Here (  http://r.789695.n4.nabble.com/file/n3904261/simone.txt
simone.txt  ) I attach a subset of the data I am working with that comprises
the variables specified in the above glm model and by the way the "funky"
variable called "hcp".
Thank you for take your time to help me in this.

--
View this message in context: http://r.789695.n4.nabble.com/binomial-GLM-quasi-separation-tp3901687p3904261.html
Sent from the R help mailing list archive at Nabble.com.
#
On Fri, 2011-10-14 at 02:32 -0700, lincoln wrote:
Have you read ?glm, specifically this bit:

Details:

....

     For the background to warning messages about ?fitted probabilities
     numerically 0 or 1 occurred? for binomial GLMs, see Venables &
     Ripley (2002, pp. 197-8).

There, V&R say (me paraphrasing) that if there are some large fitted
\beta_i the curvature of the log-likelihood at the fitted \beta can be
much less than at \beta_i = 0. The Wald approximation underestimates the
change in the LL on setting \beta_i = 0. As the absolute value of the
fitted \beta_i becomes large (tends to infinity) the t statistic tends
to 0. This is the Hauck Donner effect.

Whilst I am (so very) far from being an expert here - this does seem to
fit the results you showed.

Furthermore, did you follow the steps Ioannis Kosmidis took me through
with my data in that email thread? I have with your data and everything
seems to follow the explanation/situation given by Ioannis. Namely, if
you increase the number of iterations and tolerance in the glm() call
you get the same fit as with a standard glm() call:
Call:
glm(formula = sex ~ twp + hwp + hcp + hnp, family = binomial, 
    data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9703  -0.1760   0.3181   0.6061   3.5235  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.4362     0.2687   5.345 9.02e-08 ***
twp            5.5673     1.3602   4.093 4.26e-05 ***
hwp           -4.2793     2.3781  -1.799   0.0719 .  
hcp         -450.1918    56.6823  -7.942 1.98e-15 ***
hnp           -4.5302     3.2825  -1.380   0.1676    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 581.51  on 425  degrees of freedom
Residual deviance: 294.00  on 421  degrees of freedom
  (41 observations deleted due to missingness)
AIC: 304

Number of Fisher Scoring iterations: 8
Call:
glm(formula = sex ~ twp + hwp + hcp + hnp, family = binomial, 
    data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9703  -0.1760   0.3181   0.6061   3.5235  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.4362     0.2687   5.345 9.02e-08 ***
twp            5.5673     1.3602   4.093 4.26e-05 ***
hwp           -4.2793     2.3781  -1.799   0.0719 .  
hcp         -450.1918    56.6823  -7.942 1.98e-15 ***
hnp           -4.5302     3.2825  -1.380   0.1676    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 581.51  on 425  degrees of freedom
Residual deviance: 294.00  on 421  degrees of freedom
  (41 observations deleted due to missingness)
AIC: 304

Number of Fisher Scoring iterations: 9
Profiling the model shows that the LL starts to increase again at low
values, but does so slowly. The LL is very flat around the estimates and
is far from 0, which seems to correspond with the description of the
Hauck Donner effect given By Venables and Ripley in their book. In your
case however, the statistic is still sufficiently large for it to be
identified as significant via the Wald test.

If we fit the model via brglm() we get essentially the same "result" as
fitted by glm():
Call:
brglm(formula = sex ~ twp + hwp + hcp + hnp, family = binomial, 
    data = dat)


Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.4262     0.2662   5.357 8.47e-08 ***
twp            5.3696     1.3323   4.030 5.57e-05 ***
hwp           -4.2813     2.3504  -1.821   0.0685 .  
hcp         -435.9212    55.0566  -7.918 2.42e-15 ***
hnp           -4.6295     3.2459  -1.426   0.1538    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 560.48  on 425  degrees of freedom
Residual deviance: 294.08  on 421  degrees of freedom
Penalized deviance: 302.8212 
  (41 observations deleted due to missingness)
AIC:  304.08

Ioannis points out that if there were separation, the brglm results
would differ markedly from the glm() ones, and they don't.

As Ioannis mentioned in that thread, you can get the warning about the
fitted probabilities without a separation problem - In my case it was
because there were some very small fitted probabilities, which R was
just warning about.

HTH

G

  
    
1 day later
#
#Uwe:

I have realized that in the firstly linked post ( 
http://r.789695.n4.nabble.com/OT-quasi-separation-in-a-logistic-GLM-td875726.html#a3850331
OT-quasi-separation-in-a-logistic-GLM  ) I have told something misleading:
in fact my independent variables are not log-normally distributed since
there are lot of zeros that constitute the more frequent values. I have not
been able to normalize them: I don't even  know if it is possible to do it.
For the assumption of normally distributed predictors I believe I can't use
a lda.

#Gavin:

I have read carefully your thread but I am not sure to understand what you
are suggesting (my gaps in statistics!). You say that it should be due to
the /Hauck Donner/ effect and that it is not a quasi separation or
separation issue. Even though, I am still unsure to understand why I found
such a high asymptotic standard error.

Anyway, how should I consider this result? Should I find another way to
analyze this process or I could consider this as correct?

If I am understanding this enough, this warning message and the very high
estimates should be due to  /Hauck-Donner/. Regarding that reference to
Venables and Ripley (2002) on this issue, I have found this ( 
http://kups.ku.edu/maillist/classes/ps707/2005/msg00023.html Hauck-Donner  )
where it is said that "The practical advice, then, is to run the model with
all of the variables, and then run again with the questionable one removed,
and conduct a likelihood ratio test./ and I suppose that the p-values for
hcp should be the LRT p-value, isn't it?

Thanks for taking your time to help me in this.

Simone




--
View this message in context: http://r.789695.n4.nabble.com/binomial-GLM-quasi-separation-tp3901687p3907716.html
Sent from the R help mailing list archive at Nabble.com.
1 day later
#
On Sat, 2011-10-15 at 09:11 -0700, lincoln wrote:
<snip />
I don't believe this is a separation issue - the sorts of things we'd
expect to see if this were due to separation do not show up.

Given the large estimate for the coefficient for the term it is not that
surprising that the associated uncertainty is also high:
[1] 0.08911998
[1] 8911998

All I did there was increase the "units" of the data in the second
example and the variance is huge, but only because the data were
expressed in units 10000 times bigger than the first example. In the
same way, the coefficient estimate is large so it's standard error is
also large; the question one needs to ask is, is the estimate of the
coefficient for hcp bounded away from zero, given the uncertainty in the
estimate.

If you were to produce a profile confidence interval it too would be
large.

So you have a large estimate, which is somewhat uncertain. Given that
the slope of the log likelihood is low at the estimate and quite
different from the slope at \beta == 0, it is not unreasonable to assume
that the Hauck Donner effect might be present...
...however, in the case of the snippet of data you showed, it doesn't
affect the result - on the basis of the Wald test you would still accept
that hcp is significant/important. The Hauck Donner effect might be
leading to a lower value of the test statistic, but it hasn't affected
the outcome of the test.

To check, fit the model with and without hcp and then use the anova()
function to compare the two models. This will do a likelihood ratio
test.
Possibly, but it could just be that the fitted probabilities really are
0 or 1.
Yes. Well it is the result of applying a likelihood ratio test. I don't
think there is such a thing as *the* p-value for a term in a model, just
different ways of computing *a* p-value.

In this case, what does it matter? If the Wald test is *under*estimating
z but the term *is* still significant, the LRT should only confirm this
and give an even lower p-value than the already very low one.
Would it hurt you to reply via an email? Regardless of what Nabble
thinks, R-help is a mailing list and your *posts* keep on removing all
the context - I have to keep on hunting for the thread in the archives
just to keep track of what you have told us.

G