warning associated with Logistic Regression
On 25 Jan 2004, Peter Dalgaard wrote:
David Firth <d.firth at warwick.ac.uk> writes:
On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:
Hi All, When I tried to do logistic regression (with high maximum number of iterations) I got the following warning message Warning message: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, As I checked from the Archive R-Help mails, it seems that this happens when the dataset exhibits complete separation.
Yes. correct.
Sufficient but not necessary. It can happen just by numerical roundoff if the effect is strong enough. (I have an example with age and prevalent menarche: for nearly all women this happens between the age of 10 and 18, so if you have a couple of 40-year olds in your data set, they'll get a fitted p of 1. Happens even more easily if you throw in a cubic term.)
It also happens with partial separation (when some but not all of the fitted values go to 0/1). A common case is where only one case occurs for some cell in an interaction of factors, and so can be fitted exactly. Another example is a dataset of say 8,000 people with complete separation but one got recorded incorrectly -- then the MLE occurs at large but finite parameter values and cases dissimilar to the erroneous one will have fitted probabilities very near (but not exactly) 0/1. The asymptotic theory is valid but practically useless (the Hauck-Donner effect) in such problems since 8,000 is a small sample.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595