Skip to content

Collinearity? Cannot get logisticRidge{ridge} to work

9 messages · David Winsemius, Peter Dalgaard, Kengo Inagaki

#
I am currently working on a health care related project using R. I am
learning R while working on data analysis.

Below is the part of the data in which i am encountering a problem.



Case#    Sex         Therapy1             Therapy2             Outcome

1              male      no
no                           Alive

2              female  no
no                           Death

3              male      no
no                           Alive

4              female  no
no                           Death

5              male      no
no                           Death

6              male      no
no                           Alive

7              male      yes
no                           Alive

8              female  no
no                           Death

9              male      no
yes                         Alive

10           female  no
no                           Death

11           female  yes
yes                         Death

12           female  yes
no                           Death

13           female  yes
no                           Death

14           female  yes
no                           Alive

15           male      yes
no                           Alive

16           male      yes
no                           Alive

17           male      no
yes                         Death

18           male      no
yes                         Death

19           male      yes
no                           Alive

20           female  no
yes                         Death

21           female  yes
no                           Alive

22           female  no
yes                         Death

23           male      yes
no                           Alive

24           female  yes
no                           Alive

25           female  yes
no                           Alive



"Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
predictor variables.

All of the predictors are significantly associated with the outcome by
univariate analysis.

Logistic regression runs fine with most of the predictors when "Sex" and
"Therapy1" are not included at the same time (This is a part of table that
I cut out from a larger table for ease of

presentation and there are more predictors that i tested).

However, when "Sex" and "Therapy1" are included in logistic regression
model at the same time, standard error inflates and p value gets close to 1.

The formula used is,
vector "a" to represent above table.



After doing some reading, I suspect this might be collinearity, as vif
values (using "vif()" function in car package) were sky high (8,875,841 for
both "Sex" and "Therapy1").

Learning that ridge regression may be a solution, I attempted using
logisticRidge {ridge} using the following formula, but i get the
accomapnying error message.
Error in ifelse(y, log(p), log(1 - p)) :

  invalid to change the storage mode of a factor



At this point I do not have an idea how to solve this and would like to
seek help.

I really really appreciate your input!!!
#
On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:

            
snipped mangled data sent in HTML
Please examine the data before reaching for ridge regression:

What does this show: ...

    with(a,  table(Sex, Therapy1) )

I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
#
Thank you very much for your rapid response. I sincerely appreciate your input.
I am sorry for sending the previous email in HTML format.

with(a,  table(Sex, Therapy1) )   shows the following.
          Therapy1
Sex      no yes
  female  6   7
  male    7   5

with(a,  table(Sex, Outcome) ) and with(a,  table(Therapy1, Outcome) )
elicit the following

        Outcome
Sex      Alive Death
  female     4     9
  male       9     3

        Outcome
Therapy1 Alive Death
     no      4     9
     yes     9     3

As there is no zero cells, it does not seem to be complete separation.
I really appreciate comments.

Kengo Inagaki
Memphis, TN


2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
#
On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:

            
Then what about:

with(a,  table(Sex, Therapy1,  Outcome) )
#
Here is the result-
, , Outcome = Alive

        Therapy1
Sex      no yes
  female  0   4
  male    4   5

, , Outcome = Death

        Therapy1
Sex      no yes
  female  6   3
  male    3   0


2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
#
On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote:

            
So no deaths when Female had no-Therapy1 and no survivors with the opposite for those variables. Complete separation.
#
I did not understand complete separation quite well..
Thank you very much for clarification.

Kengo

2015-05-27 17:03 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
#
On 28 May 2015, at 00:06 , Kengo Inagaki <kengoing.gj at gmail.com> wrote:

            
Actually not quite complete separation, but just as bad.  If you look at the linear combination Sex + Therapy, you get

0 (female, no therapy)
1 (female, therapy OR male, no therapy
2 (male, therapy)


0: 6 dead, 0 survive
1: 6 dead, 8 survive
2: 0 dead, 5 survive

and any logistic curve through (1, log(6/8)) fits the middle point and the other two will be fitted better and better as the curve gets steeper, so the fit diverges. 

That's a general pattern: you can have complete separation except at one point and still get divergence. Similarly (and really just the same), if you have multiple regression with k parameters and there's a k-1 dimensional hyperplane in predictor space with all responses 0  on one side and 1 on the other, but possibly both 0 and 1 _on_ the hyperplane. Google tells me that this is called quasicomplete separation.

-pd

  
    
#
Dr. Dalgaard,

Thank you for further clarifying the problem.
I found a few possible solutions on internet, and will try to find the solution.

This was my first time to post questions on this mailing list, and I
learned quite a bit though working on this problem.
I apologize for any impoliteness you may have noticed.

Best regards,

Kengo


2015-05-28 4:26 GMT-05:00 peter dalgaard <pdalgd at gmail.com>: