On 28 May 2015, at 00:06 , Kengo Inagaki <kengoing.gj at gmail.com> wrote:
I did not understand complete separation quite well..
Thank you very much for clarification.
Kengo
2015-05-27 17:03 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote:
with(a, table(Sex, Therapy1, Outcome) )
, , Outcome = Alive
Therapy1
Sex no yes
female 0 4
male 4 5
, , Outcome = Death
Therapy1
Sex no yes
female 6 3
male 3 0
So no deaths when Female had no-Therapy1 and no survivors with the opposite for those variables. Complete separation.
Actually not quite complete separation, but just as bad. If you look at the linear combination Sex + Therapy, you get
0 (female, no therapy)
1 (female, therapy OR male, no therapy
2 (male, therapy)
0: 6 dead, 0 survive
1: 6 dead, 8 survive
2: 0 dead, 5 survive
and any logistic curve through (1, log(6/8)) fits the middle point and the other two will be fitted better and better as the curve gets steeper, so the fit diverges.
That's a general pattern: you can have complete separation except at one point and still get divergence. Similarly (and really just the same), if you have multiple regression with k parameters and there's a k-1 dimensional hyperplane in predictor space with all responses 0 on one side and 1 on the other, but possibly both 0 and 1 _on_ the hyperplane. Google tells me that this is called quasicomplete separation.
-pd
2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
Thank you very much for your rapid response. I sincerely appreciate your input.
I am sorry for sending the previous email in HTML format.
with(a, table(Sex, Therapy1) ) shows the following.
Therapy1
Sex no yes
female 6 7
male 7 5
and with(a, table(Therapy1, Outcome) )
elicit the following
Outcome
Sex Alive Death
female 4 9
male 9 3
Outcome
Therapy1 Alive Death
no 4 9
yes 9 3
Then what about:
with(a, table(Sex, Therapy1, Outcome) )
--
David
As there is no zero cells, it does not seem to be complete separation.
I really appreciate comments.
Kengo Inagaki
Memphis, TN
2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
I am currently working on a health care related project using R. I am
learning R while working on data analysis.
Below is the part of the data in which i am encountering a problem.
Case# Sex Therapy1 Therapy2 Outcome
1 male no
no Alive
snipped mangled data sent in HTML
"Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
predictor variables.
All of the predictors are significantly associated with the outcome by
univariate analysis.
Logistic regression runs fine with most of the predictors when "Sex" and
"Therapy1" are not included at the same time (This is a part of table that
I cut out from a larger table for ease of
presentation and there are more predictors that i tested).
Please examine the data before reaching for ridge regression:
What does this show: ...
with(a, table(Sex, Therapy1) )
I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
--
David.
However, when "Sex" and "Therapy1" are included in logistic regression
model at the same time, standard error inflates and p value gets close to 1.
The formula used is,
Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
vector "a" to represent above table.
After doing some reading, I suspect this might be collinearity, as vif
values (using "vif()" function in car package) were sky high (8,875,841 for
both "Sex" and "Therapy1").
Learning that ridge regression may be a solution, I attempted using
logisticRidge {ridge} using the following formula, but i get the
accomapnying error message.
logisticRidge(a$Outcome~a$Sex+a$Therapy1)
Error in ifelse(y, log(p), log(1 - p)) :
invalid to change the storage mode of a factor
At this point I do not have an idea how to solve this and would like to
seek help.
I really really appreciate your input!!!
[[alternative HTML version deleted]]
David Winsemius
Alameda, CA, USA
David Winsemius
Alameda, CA, USA
David Winsemius
Alameda, CA, USA