logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred - R-help

Thu, Dec 1, 2011 9:54 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111201/56baa0a9/attachment.pl>

Peter Dalgaard

Thu, Dec 1, 2011 10:55 AM #

On Dec 1, 2011, at 18:54 , Ben quant wrote:

Still not there. Sometimes it's because your mailer doesn't label them with the appropriate mime-type (e.g. as application/octet-stream, which is "arbitrary binary"). Anyways, see below

[snip]

I bet that you do... You can get the warning without that effect (one of my own examples is  the probability of menarche in a data set that includes infants and old age pensioners), but not with a huge odds ratio as well. Take a look at 

d <- as.data.frame(l_yx) 
with(d, y[order(x)])

if it comes out as all zeros followed by all ones or vice versa, then you have the problem.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Ben quant

Thu, Dec 1, 2011 12:32 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111201/0811f7b7/attachment.pl>

Peter Dalgaard

Thu, Dec 1, 2011 1:24 PM #

On Dec 1, 2011, at 21:32 , Ben quant wrote:

Hum, then maybe it really is a case of a transition region being short relative to the range of your data. Notice that the warning is just that: a warning. I do notice that the distribution of your x values is rather extreme -- you stated a range of 0--14 and a mean of 0.01. And after all, an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.

Have a look at  range(x[y==0]) and range(x[y==1]).

I followed your instructions and here is a sample of the first 500 values : (info on 'd' is below that)

 d <- as.data.frame(l_yx) 
x = with(d, y[order(x)])
x[1:500] # I have 1's and 0's dispersed throughout

  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
[301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

# I get the warning still

l_df = as.data.frame(l_yx)
    l_logit = glm(y~x, data=l_df, family=binomial(link="logit"))

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

# some info on 'd' above:

d[1:500,1]

  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

d[1:500,2]

  [1] 3.023160e-03 7.932130e-02 0.000000e+00 1.779657e-02 1.608374e-01 0.000000e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02 2.080511e-02 1.642404e-01 3.703720e-03  8.901981e-02 1.260415e-03
 [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03 0.000000e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02 2.954641e-04 1.434859e-03 6.964976e-04 0.000000e+00 1.202162e-02
 [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01 3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
 [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02 0.000000e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
 [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.774966e-02 2.433198e-02 1.371257e-02 6.732339e-02 9.747677e-03 8.180594e-03 1.882473e-01 1.682596e-03 1.410170e-02 8.429676e-03 3.520589e-01
 [76] 1.865160e-02 2.389941e-02 3.301558e-02 4.896605e-03 6.826244e-03 9.151337e-02 1.722270e-02 1.764039e-01 3.554063e-02 7.338532e-03 9.577793e-03 3.632366e-03 1.422400e-02 3.020497e-02 1.797096e-02
 [91] 2.577899e-03 1.338365e-01 8.202223e-03 1.088335e-02 2.137259e-02 6.829797e-03 3.104860e-04 8.927160e-03 5.626881e-02 1.197682e-02 6.140553e-02 2.729364e-02 1.377913e-02 3.018277e-03 1.188304e-02
[106] 2.029268e-03 2.108815e-02 1.765503e-03 2.449253e-02 3.677046e-03 1.013463e-02 4.286642e-02 1.238931e-02 3.072090e-03 1.509613e-02 3.609885e-02 5.537268e-03 3.151614e-02 3.703148e-03 1.409401e-01
[121] 1.473365e-02 9.160461e-03 1.035099e-01 3.005865e-02 1.332199e-02 6.936535e-03 2.433539e-02 4.935373e-03 4.822740e-03 1.597643e-03 3.805619e-03 1.092683e-02 1.760635e-01 5.565614e-03 7.739213e-04
[136] 4.529198e-03 5.110359e-03 7.391258e-02 5.018207e-03 2.071417e-02 1.825787e-02 9.141405e-03 1.078386e-01 2.171470e-04 7.164583e-03 2.522077e-02 1.994428e-03 6.044653e-03 1.778078e-04 5.797706e-03
[151] 1.526778e-02 1.595897e-02 1.995627e-01 1.946735e-03 6.711582e-02 2.722350e-02 3.127499e-02 1.074904e-01 2.477414e-03 5.015375e-02 3.417810e-02 2.046643e-02 1.644832e-02 5.220166e-03 1.217752e-02
[166] 1.187352e-02 1.634016e-02 2.349568e-03 3.451811e-02 2.593540e-03 6.868595e-03 1.311236e-02 6.457757e-03 2.942391e-04 1.628352e-02 8.288831e-03 3.170856e-04 1.251331e+00 1.706954e-02 1.063723e-03
[181] 1.374416e-02 2.140507e-02 2.817009e-02 2.272793e-02 4.365562e-02 6.089414e-03 2.498083e-02 1.360471e-02 1.884079e-02 1.448660e-02 2.341314e-02 8.167064e-03 4.109117e-02 2.660633e-02 7.711723e-03
[196] 9.590278e-03 2.515490e-03 1.978033e-02 3.454990e-02 8.072748e-03 4.718885e-03 1.621131e-01 4.547743e-03 1.081195e-02 9.572051e-04 1.790391e-02 1.618026e-02 1.910230e-02 1.861914e-02 3.485475e-02
[211] 2.844890e-03 1.866889e-02 1.378208e-02 2.451514e-02 2.535044e-03 3.921364e-04 1.557266e-03 3.315892e-03 1.752821e-03 6.786187e-03 1.360921e-02 9.550702e-03 8.114506e-03 5.068741e-03 1.729822e-02
[226] 1.902033e-02 8.196564e-03 2.632880e-03 1.587969e-02 8.354079e-04 1.050023e-03 4.236195e-04 9.181120e-03 4.995919e-04 1.092234e-02 1.207544e-02 2.187243e-01 3.251349e-02 1.269134e-03 1.557751e-04
[241] 1.232498e-02 2.654449e-02 1.049324e-03 8.442729e-03 6.331691e-03 1.715609e-02 1.017800e-03 9.230006e-03 1.331373e-02 5.596195e-02 1.296551e-03 5.272687e-03 2.805640e-02 4.790665e-02 2.043011e-02
[256] 1.047226e-02 1.866499e-02 9.323001e-03 8.920536e-03 1.582911e-03 2.776238e-03 2.914762e-02 4.402356e-03 9.555274e-04 1.681966e-03 7.584319e-04 6.758914e-02 1.505431e-02 2.213308e-02 1.329330e-02
[271] 7.284363e-03 2.687818e-02 2.997535e-03 7.470007e-03 2.070569e-03 3.441944e-02 1.717768e-02 4.523364e-02 1.003558e-02 1.365111e-02 1.906845e-02 1.676223e-02 3.506809e-04 9.164257e-02 9.008416e-03
[286] 1.073903e-02 4.855937e-03 8.618043e-03 2.529247e-02 1.059375e-02 5.834253e-03 2.004309e-02 1.460387e-02 2.899190e-02 5.867984e-03 1.983956e-02 6.834339e-03 1.925821e-03 9.231870e-03 6.839616e-03
[301] 1.029972e-02 2.009769e-02 9.458785e-03 1.183901e-02 8.911549e-03 1.264745e-02 2.995451e-03 7.657983e-04 5.315853e-03 1.325039e-02 1.044103e-02 2.307236e-02 2.780789e-02 1.735145e-02 9.053126e-03
[316] 5.847638e-02 3.815715e-03 5.087690e-03 1.040513e-02 4.475672e-02 6.564791e-02 3.233571e-03 1.076193e-02 8.283819e-02 5.370256e-03 3.533256e-02 1.302812e-02 1.896783e-02 2.055282e-02 3.572239e-03
[331] 5.867681e-03 5.864974e-04 9.715807e-03 1.665469e-02 5.082044e-02 3.547168e-03 3.069631e-03 1.274717e-02 1.858226e-03 3.104809e-04 1.247831e-02 2.073575e-03 3.544110e-04 7.240736e-03 8.452117e-05
[346] 8.149151e-04 4.942461e-05 1.142303e-03 6.265512e-04 3.666717e-04 3.244669e-02 7.242018e-03 6.335951e-04 2.329072e-02 3.719716e-03 2.803425e-02 1.623981e-02 6.387102e-03 8.807679e-03 1.214914e-02
[361] 6.699341e-03 1.148082e-02 1.329736e-02 1.537364e-03 2.004390e-02 1.562065e-02 1.655465e-02 9.960172e-02 2.174588e-02 1.209472e-02 2.328413e-02 2.012760e-04 1.422327e-02 2.194455e-03 2.307362e-02
[376] 4.315764e-03 3.208576e-02 3.826598e-02 1.828001e-02 3.935978e-03 5.294211e-04 1.392423e-02 6.588394e-03 1.040147e-03 1.260787e-02 9.051757e-04 5.353215e-02 6.049058e-02 1.382630e-01 1.064124e-01
[391] 3.380742e-03 1.798038e-02 1.557048e-01 1.217146e-02 4.140520e-02 4.707564e-02 2.786042e-02 8.836988e-03 5.542879e-03 1.862664e-02 8.858770e-03 1.026681e-03 1.692105e-02 8.849238e-03 7.143816e-03
[406] 1.630118e-02 1.165920e-01 9.471496e-03 4.879998e-02 1.388216e-02 1.453267e-02 4.845224e-04 1.415190e-03 1.208627e-02 1.372348e-02 2.573131e-02 1.169595e-02 1.825447e-02 2.574299e-02 5.301360e-02
[421] 6.961110e-03 7.781891e-03 1.013308e-03 3.160916e-03 1.090344e-02 1.530841e-02 9.398088e-04 9.143726e-04 1.286683e-02 2.006193e-02 1.774378e-02 5.681591e-02 9.584676e-03 7.957152e-02 4.485609e-03
[436] 1.086684e-02 2.930273e-03 6.085481e-03 4.342320e-03 1.319999e-02 2.120402e-02 4.477545e-02 1.991814e-02 8.893947e-03 7.790133e-03 1.610199e-02 2.441280e-02 2.781231e-03 1.410080e-02 1.639912e-02
[451] 1.797498e-02 1.185382e-02 2.775063e-02 3.797315e-02 1.428883e-02 1.272659e-02 2.390500e-03 7.503478e-03 8.965356e-03 2.139452e-02 2.028536e-02 6.916416e-02 1.615986e-02 4.837412e-02 1.561731e-02
[466] 7.130332e-03 9.208406e-05 1.099934e-02 2.003469e-02 1.395857e-02 9.883482e-03 4.110852e-02 1.202052e-02 2.833039e-02 1.233236e-02 2.145801e-02 7.900161e-03 4.663819e-02 4.410819e-03 5.115056e-04
[481] 9.100270e-04 4.013683e-03 1.227139e-02 3.304697e-03 2.919099e-03 6.112390e-03 1.922229e-02 1.208282e-03 1.164037e-02 2.166888e-02 4.381615e-02 5.318929e-03 7.226343e-03 2.732819e-02 2.385092e-04
[496] 4.905250e-02 1.159876e-02 4.068228e-03 3.349013e-02 1.273468e-03


Thanks for your help,

Ben

On Thu, Dec 1, 2011 at 11:55 AM, peter dalgaard <pdalgd at gmail.com> wrote:

On Dec 1, 2011, at 18:54 , Ben quant wrote:

Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
below did not go through.

Still not there. Sometimes it's because your mailer doesn't label them with the appropriate mime-type (e.g. as application/octet-stream, which is "arbitrary binary"). Anyways, see below

[snip]

With the above data I do:

   l_logit = glm(y~x, data=as.data.frame(l_yx),

family=binomial(link="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Why am I getting this warning when I have data points of varying values for
y=1 and y=0?  In other words, I don't think I have the linear separation
issue discussed in one of the links I provided.

I bet that you do... You can get the warning without that effect (one of my own examples is  the probability of menarche in a data set that includes infants and old age pensioners), but not with a huge odds ratio as well. Take a look at

d <- as.data.frame(l_yx)
with(d, y[order(x)])

if it comes out as all zeros followed by all ones or vice versa, then you have the problem.

PS - Then I do this and I get a odds ratio a crazy size:

   l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds

$coefficients[2]

   l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the

coeffcients

   l_exp_coef

      x
3161.781

So for one unit increase in the predictor variable I get 3160.781%
(3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
How do I correct for this issue? (I tried multiplying the predictor
variables by a constant and the odds ratio goes down, but the warning above
still persists and shouldn't the odds ratio be predictor variable size
independent?)


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Ben quant

Thu, Dec 1, 2011 2:07 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111201/fb13678d/attachment.pl>

Ben quant

Thu, Dec 1, 2011 2:25 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111201/130cc513/attachment.pl>

Ben quant

Thu, Dec 1, 2011 2:43 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111201/0b7247dc/attachment.pl>

Peter Dalgaard

Thu, Dec 1, 2011 3:32 PM #

On Dec 1, 2011, at 23:43 , Ben quant wrote:

It's easier to explain why you got the warning before. If the OR for a one unit change is 3000, the OR for a 14 unit change is on the order of 10^48 and that causes over/underflow in the conversion to probabilities.

I'm still baffled at how you can get that model fitted to your data, though. One thing is that you can have situations where there are fitted probabilities of one corresponding to data that are all one and/or fitted zeros where data are zero, but you seem to have cases where you have both zeros and ones at both ends of the range of x. Fitting a zero to a one or vice versa would make the likelihood zero, so you'd expect that the algorithm would find a better set of parameters rather quickly. Perhaps the extremely large number of observations that you have has something to do with it? 

You'll get the warning if the fitted zeros or ones occur at any point of the iterative procedure. Maybe it isn't actually true for the final model, but that wouldn't seem consistent with the OR that you cited.

Anyways, your real problem lies with the distribution of the x values. I'd want to try transforming it to something more sane. Taking logarithms is the obvious idea, but you'd need to find out what to do about the zeros -- perhaps log(x + 1e-4) ? Or maybe just cut the outliers down to size with pmin(x,1).

I did this 3 times to get rid of the 3 outliers:
mx_dims = arrayInd(which.max(l_yx), dim(l_yx))
l_yx[mx_dims] = 0

Now this does not produce an warning:
l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link="logit")) 

Can someone tell me why occurred?

Also, again, here are the screen shots of my data that I tried to send earlier (two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you for your help,

Ben

On Thu, Dec 1, 2011 at 3:25 PM, Ben quant <ccquant at gmail.com> wrote:
Oops! Please ignore my last post. I mistakenly gave you different data I was testing with. This is the correct data:

Here you go:

attach(as.data.frame(l_yx))
 range(x[y==0])

[1]  0.00000 14.66518

range(x[y==1])

[1]  0.00000 13.49791


How do I know what is acceptable? 

Also, here are the screen shots of my data that I tried to send earlier (two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you,

Ben

On Thu, Dec 1, 2011 at 3:07 PM, Ben quant <ccquant at gmail.com> wrote:
Here you go:

attach(as.data.frame(l_yx))
range(x[y==1])

[1] -22500.0000    746.6666

 range(x[y==0])

[1] -10076.5303    653.0228

How do I know what is acceptable? 

Also, here are the screen shots of my data that I tried to send earlier (two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you,

Ben


On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard <pdalgd at gmail.com> wrote:

On Dec 1, 2011, at 21:32 , Ben quant wrote:

Thank you for the feedback, but my data looks fine to me. Please tell me if I'm not understanding.

Hum, then maybe it really is a case of a transition region being short relative to the range of your data. Notice that the warning is just that: a warning. I do notice that the distribution of your x values is rather extreme -- you stated a range of 0--14 and a mean of 0.01. And after all, an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.

Have a look at  range(x[y==0]) and range(x[y==1]).

I followed your instructions and here is a sample of the first 500 values : (info on 'd' is below that)

 d <- as.data.frame(l_yx)
x = with(d, y[order(x)])
x[1:500] # I have 1's and 0's dispersed throughout

  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
[301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

# I get the warning still

l_df = as.data.frame(l_yx)
    l_logit = glm(y~x, data=l_df, family=binomial(link="logit"))

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

# some info on 'd' above:

d[1:500,1]

  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

d[1:500,2]

  [1] 3.023160e-03 7.932130e-02 0.000000e+00 1.779657e-02 1.608374e-01 0.000000e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02 2.080511e-02 1.642404e-01 3.703720e-03  8.901981e-02 1.260415e-03
 [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03 0.000000e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02 2.954641e-04 1.434859e-03 6.964976e-04 0.000000e+00 1.202162e-02
 [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01 3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
 [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02 0.000000e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
 [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.774966e-02 2.433198e-02 1.371257e-02 6.732339e-02 9.747677e-03 8.180594e-03 1.882473e-01 1.682596e-03 1.410170e-02 8.429676e-03 3.520589e-01
 [76] 1.865160e-02 2.389941e-02 3.301558e-02 4.896605e-03 6.826244e-03 9.151337e-02 1.722270e-02 1.764039e-01 3.554063e-02 7.338532e-03 9.577793e-03 3.632366e-03 1.422400e-02 3.020497e-02 1.797096e-02
 [91] 2.577899e-03 1.338365e-01 8.202223e-03 1.088335e-02 2.137259e-02 6.829797e-03 3.104860e-04 8.927160e-03 5.626881e-02 1.197682e-02 6.140553e-02 2.729364e-02 1.377913e-02 3.018277e-03 1.188304e-02
[106] 2.029268e-03 2.108815e-02 1.765503e-03 2.449253e-02 3.677046e-03 1.013463e-02 4.286642e-02 1.238931e-02 3.072090e-03 1.509613e-02 3.609885e-02 5.537268e-03 3.151614e-02 3.703148e-03 1.409401e-01
[121] 1.473365e-02 9.160461e-03 1.035099e-01 3.005865e-02 1.332199e-02 6.936535e-03 2.433539e-02 4.935373e-03 4.822740e-03 1.597643e-03 3.805619e-03 1.092683e-02 1.760635e-01 5.565614e-03 7.739213e-04
[136] 4.529198e-03 5.110359e-03 7.391258e-02 5.018207e-03 2.071417e-02 1.825787e-02 9.141405e-03 1.078386e-01 2.171470e-04 7.164583e-03 2.522077e-02 1.994428e-03 6.044653e-03 1.778078e-04 5.797706e-03
[151] 1.526778e-02 1.595897e-02 1.995627e-01 1.946735e-03 6.711582e-02 2.722350e-02 3.127499e-02 1.074904e-01 2.477414e-03 5.015375e-02 3.417810e-02 2.046643e-02 1.644832e-02 5.220166e-03 1.217752e-02
[166] 1.187352e-02 1.634016e-02 2.349568e-03 3.451811e-02 2.593540e-03 6.868595e-03 1.311236e-02 6.457757e-03 2.942391e-04 1.628352e-02 8.288831e-03 3.170856e-04 1.251331e+00 1.706954e-02 1.063723e-03
[181] 1.374416e-02 2.140507e-02 2.817009e-02 2.272793e-02 4.365562e-02 6.089414e-03 2.498083e-02 1.360471e-02 1.884079e-02 1.448660e-02 2.341314e-02 8.167064e-03 4.109117e-02 2.660633e-02 7.711723e-03
[196] 9.590278e-03 2.515490e-03 1.978033e-02 3.454990e-02 8.072748e-03 4.718885e-03 1.621131e-01 4.547743e-03 1.081195e-02 9.572051e-04 1.790391e-02 1.618026e-02 1.910230e-02 1.861914e-02 3.485475e-02
[211] 2.844890e-03 1.866889e-02 1.378208e-02 2.451514e-02 2.535044e-03 3.921364e-04 1.557266e-03 3.315892e-03 1.752821e-03 6.786187e-03 1.360921e-02 9.550702e-03 8.114506e-03 5.068741e-03 1.729822e-02
[226] 1.902033e-02 8.196564e-03 2.632880e-03 1.587969e-02 8.354079e-04 1.050023e-03 4.236195e-04 9.181120e-03 4.995919e-04 1.092234e-02 1.207544e-02 2.187243e-01 3.251349e-02 1.269134e-03 1.557751e-04
[241] 1.232498e-02 2.654449e-02 1.049324e-03 8.442729e-03 6.331691e-03 1.715609e-02 1.017800e-03 9.230006e-03 1.331373e-02 5.596195e-02 1.296551e-03 5.272687e-03 2.805640e-02 4.790665e-02 2.043011e-02
[256] 1.047226e-02 1.866499e-02 9.323001e-03 8.920536e-03 1.582911e-03 2.776238e-03 2.914762e-02 4.402356e-03 9.555274e-04 1.681966e-03 7.584319e-04 6.758914e-02 1.505431e-02 2.213308e-02 1.329330e-02
[271] 7.284363e-03 2.687818e-02 2.997535e-03 7.470007e-03 2.070569e-03 3.441944e-02 1.717768e-02 4.523364e-02 1.003558e-02 1.365111e-02 1.906845e-02 1.676223e-02 3.506809e-04 9.164257e-02 9.008416e-03
[286] 1.073903e-02 4.855937e-03 8.618043e-03 2.529247e-02 1.059375e-02 5.834253e-03 2.004309e-02 1.460387e-02 2.899190e-02 5.867984e-03 1.983956e-02 6.834339e-03 1.925821e-03 9.231870e-03 6.839616e-03
[301] 1.029972e-02 2.009769e-02 9.458785e-03 1.183901e-02 8.911549e-03 1.264745e-02 2.995451e-03 7.657983e-04 5.315853e-03 1.325039e-02 1.044103e-02 2.307236e-02 2.780789e-02 1.735145e-02 9.053126e-03
[316] 5.847638e-02 3.815715e-03 5.087690e-03 1.040513e-02 4.475672e-02 6.564791e-02 3.233571e-03 1.076193e-02 8.283819e-02 5.370256e-03 3.533256e-02 1.302812e-02 1.896783e-02 2.055282e-02 3.572239e-03
[331] 5.867681e-03 5.864974e-04 9.715807e-03 1.665469e-02 5.082044e-02 3.547168e-03 3.069631e-03 1.274717e-02 1.858226e-03 3.104809e-04 1.247831e-02 2.073575e-03 3.544110e-04 7.240736e-03 8.452117e-05
[346] 8.149151e-04 4.942461e-05 1.142303e-03 6.265512e-04 3.666717e-04 3.244669e-02 7.242018e-03 6.335951e-04 2.329072e-02 3.719716e-03 2.803425e-02 1.623981e-02 6.387102e-03 8.807679e-03 1.214914e-02
[361] 6.699341e-03 1.148082e-02 1.329736e-02 1.537364e-03 2.004390e-02 1.562065e-02 1.655465e-02 9.960172e-02 2.174588e-02 1.209472e-02 2.328413e-02 2.012760e-04 1.422327e-02 2.194455e-03 2.307362e-02
[376] 4.315764e-03 3.208576e-02 3.826598e-02 1.828001e-02 3.935978e-03 5.294211e-04 1.392423e-02 6.588394e-03 1.040147e-03 1.260787e-02 9.051757e-04 5.353215e-02 6.049058e-02 1.382630e-01 1.064124e-01
[391] 3.380742e-03 1.798038e-02 1.557048e-01 1.217146e-02 4.140520e-02 4.707564e-02 2.786042e-02 8.836988e-03 5.542879e-03 1.862664e-02 8.858770e-03 1.026681e-03 1.692105e-02 8.849238e-03 7.143816e-03
[406] 1.630118e-02 1.165920e-01 9.471496e-03 4.879998e-02 1.388216e-02 1.453267e-02 4.845224e-04 1.415190e-03 1.208627e-02 1.372348e-02 2.573131e-02 1.169595e-02 1.825447e-02 2.574299e-02 5.301360e-02
[421] 6.961110e-03 7.781891e-03 1.013308e-03 3.160916e-03 1.090344e-02 1.530841e-02 9.398088e-04 9.143726e-04 1.286683e-02 2.006193e-02 1.774378e-02 5.681591e-02 9.584676e-03 7.957152e-02 4.485609e-03
[436] 1.086684e-02 2.930273e-03 6.085481e-03 4.342320e-03 1.319999e-02 2.120402e-02 4.477545e-02 1.991814e-02 8.893947e-03 7.790133e-03 1.610199e-02 2.441280e-02 2.781231e-03 1.410080e-02 1.639912e-02
[451] 1.797498e-02 1.185382e-02 2.775063e-02 3.797315e-02 1.428883e-02 1.272659e-02 2.390500e-03 7.503478e-03 8.965356e-03 2.139452e-02 2.028536e-02 6.916416e-02 1.615986e-02 4.837412e-02 1.561731e-02
[466] 7.130332e-03 9.208406e-05 1.099934e-02 2.003469e-02 1.395857e-02 9.883482e-03 4.110852e-02 1.202052e-02 2.833039e-02 1.233236e-02 2.145801e-02 7.900161e-03 4.663819e-02 4.410819e-03 5.115056e-04
[481] 9.100270e-04 4.013683e-03 1.227139e-02 3.304697e-03 2.919099e-03 6.112390e-03 1.922229e-02 1.208282e-03 1.164037e-02 2.166888e-02 4.381615e-02 5.318929e-03 7.226343e-03 2.732819e-02 2.385092e-04
[496] 4.905250e-02 1.159876e-02 4.068228e-03 3.349013e-02 1.273468e-03


Thanks for your help,

Ben

On Thu, Dec 1, 2011 at 11:55 AM, peter dalgaard <pdalgd at gmail.com> wrote:

On Dec 1, 2011, at 18:54 , Ben quant wrote:

Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
below did not go through.

Still not there. Sometimes it's because your mailer doesn't label them with the appropriate mime-type (e.g. as application/octet-stream, which is "arbitrary binary"). Anyways, see below

[snip]

With the above data I do:

   l_logit = glm(y~x, data=as.data.frame(l_yx),

family=binomial(link="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Why am I getting this warning when I have data points of varying values for
y=1 and y=0?  In other words, I don't think I have the linear separation
issue discussed in one of the links I provided.

I bet that you do... You can get the warning without that effect (one of my own examples is  the probability of menarche in a data set that includes infants and old age pensioners), but not with a huge odds ratio as well. Take a look at

d <- as.data.frame(l_yx)
with(d, y[order(x)])

if it comes out as all zeros followed by all ones or vice versa, then you have the problem.

PS - Then I do this and I get a odds ratio a crazy size:

   l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds

$coefficients[2]

   l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the

coeffcients

   l_exp_coef

      x
3161.781

So for one unit increase in the predictor variable I get 3160.781%
(3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
How do I correct for this issue? (I tried multiplying the predictor
variables by a constant and the odds ratio goes down, but the warning above
still persists and shouldn't the odds ratio be predictor variable size
independent?)


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Ben quant

Thu, Dec 1, 2011 5:00 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111201/a9ac775c/attachment.pl>

Patrick Breheny

Fri, Dec 2, 2011 6:08 AM #

On 12/01/2011 08:00 PM, Ben quant wrote:

The logistic regression model you are fitting assumes a linear 
relationship between x and the log odds of y; that does not seem to be 
the case for your data.  To illustrate:

x <- l_yx[,"x"]
y <- l_yx[,"y"]
ind1 <- x <= .002
ind2 <- (x > .002 & x <= .0065)
ind3 <- (x > .0065 & x <= .13)
ind4 <- (x > .0065 & x <= .13)

 > summary(glm(y[ind1]~x[ind1],family=binomial))
...
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.79174    0.02633 -106.03   <2e-16 ***
x[ind1]     354.98852   22.78190   15.58   <2e-16 ***

 > summary(glm(y[ind2]~x[ind2],family=binomial))
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.15805    0.02966 -72.766   <2e-16 ***
x[ind2]     -59.92934    6.51650  -9.197   <2e-16 ***

 > summary(glm(y[ind3]~x[ind3],family=binomial))
...
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.367206   0.007781 -304.22   <2e-16 ***
x[ind3]     18.104314   0.346562   52.24   <2e-16 ***

 > summary(glm(y[ind4]~x[ind4],family=binomial))
...
Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.31511    0.08549 -15.383   <2e-16 ***
x[ind4]      0.06261    0.08784   0.713    0.476

To summarize, the relationship between x and the log odds of y appears 
to vary dramatically in both magnitude and direction depending on which 
interval of x's range we're looking at.  Trying to summarize this 
complicated pattern with a single line is leading to the fitted 
probabilities near 0 and 1 you are observing (note that only 0.1% of the 
data is in region 4 above, although region 4 accounts for 99.1% of the 
range of x).

Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky