warning associated with Logistic Regression
On 25-Jan-04 Guillem Chust wrote:
Hi All, When I tried to do logistic regression (with high maximum number of iterations) I got the following warning message Warning message: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, As I checked from the Archive R-Help mails, it seems that this happens when the dataset exhibits complete separation.
This is so. Indeed, there is a sense in which you are experiencing unusually good fortune, since for values of your predictors in one region you are perfectly predicting the 0s in your reponse, and for values in another region your a perfectly predicting the 1s. What better could you hope for? However, you would respond that this is not realistic: your variables are not (in real life) such that P(Y=1|X=x) is ever exactly 1 or exactly 0, so this perfect prediction is not realistic. In that case, you are somewhat stuck. The plain fact is that your data (in particular the way the values of the X variables are distributed) are not adequate to tell you what is happening. There may be manipulative tricks (like penalised regression) which would inhibit the logistic regression from going all the way to a perfect fit; but, then, how would you know how far to let it go (because it will certainly go as far in that direction as you allow it to). The key parameter in this situation the dispersion parameter (sigma in the usual notation). When you get perfect fit in a "completely separated" situation, this corresponds to sigma=0. If you don't like this, then there must be reasons why you want sigma>0 and this may imply that you have reasons for wanting sigma to be at least s0 (say), or, if you are prepared to be Bayesian about it, you may be satisfied that there is a prior distribution for sigma which would not allow sigma=0, and would attach high probability to a range of sigma values which you condisder to be realistic. Unless you have a fairly firm idea of what sort of values sigma is likely to havem then you are indeed stuck because you have no reason to prefer one positive value of sigma to a different positive value of sigma. In that case you cannot really object if the logistic regression tries to make it as small as possible! In the absence of such reasons, you may consider exploring the effect of fixing sigma at some positive value, and then varying this value. For each such value, look at the estimates of the coefficients of the X variables, the goodness of fit, and so on. This may help you to form an idea of what sort of estimate you should hope for, and would enable you to design a better dataset (i.e. placement of X values) which would be capable of supporting a fit which was both realistic and estimated with adequate precision. Another approach you should consider, if you have several X variables, is to look at subsets of these variables, retaining in the first instance only those few (the fewer the better) which on substantive grounds you considered to be the most important in the application to which the data refer. Also look at the multivariate distribution of the X values and in particular carry out a linear discriminant anaysis on them. If, however, you have only 1 X variable, then you have a situation equivalent to the following (pairs of (x,y)): (-2,0), (-1,9), (0,0), (1,1), (2,1), (3,1). clearly you are not going to get anything out of this unless you either repeat the experiment many times (so that you have several Y responses at each value of X, and probabilities between 0 and 1 at each X then have a better chance to express themselves, as so many 0s and also so many 1s at each X), or you fill in the range over which P(Y=1|X=x) increases from low to high, e.g. by observing Y for X = -1.0, -0.9, -0.8, ... , 0.0, 0.1, ... 1.9, 2.0 (say). I hope these suggestions help. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 25-Jan-04 Time: 18:06:16 ------------------------------ XFMail ------------------------------