On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote:
Thank you very much for your rapid response. I sincerely appreciate your input.
I am sorry for sending the previous email in HTML format.
with(a, table(Sex, Therapy1) ) shows the following.
Therapy1
Sex no yes
female 6 7
male 7 5
and with(a, table(Therapy1, Outcome) )
elicit the following
Outcome
Sex Alive Death
female 4 9
male 9 3
Outcome
Therapy1 Alive Death
no 4 9
yes 9 3
Then what about:
with(a, table(Sex, Therapy1, Outcome) )
--
David
As there is no zero cells, it does not seem to be complete separation.
I really appreciate comments.
Kengo Inagaki
Memphis, TN
2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:
On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote:
I am currently working on a health care related project using R. I am
learning R while working on data analysis.
Below is the part of the data in which i am encountering a problem.
Case# Sex Therapy1 Therapy2 Outcome
1 male no
no Alive
snipped mangled data sent in HTML
"Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are
predictor variables.
All of the predictors are significantly associated with the outcome by
univariate analysis.
Logistic regression runs fine with most of the predictors when "Sex" and
"Therapy1" are not included at the same time (This is a part of table that
I cut out from a larger table for ease of
presentation and there are more predictors that i tested).
Please examine the data before reaching for ridge regression:
What does this show: ...
with(a, table(Sex, Therapy1) )
I predict you will see a zero cell entry. The read about "complete separation" and the so-called "Hauck-Donner effect".
--
David.
However, when "Sex" and "Therapy1" are included in logistic regression
model at the same time, standard error inflates and p value gets close to 1.
The formula used is,
Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a
vector "a" to represent above table.
After doing some reading, I suspect this might be collinearity, as vif
values (using "vif()" function in car package) were sky high (8,875,841 for
both "Sex" and "Therapy1").
Learning that ridge regression may be a solution, I attempted using
logisticRidge {ridge} using the following formula, but i get the
accomapnying error message.
logisticRidge(a$Outcome~a$Sex+a$Therapy1)
Error in ifelse(y, log(p), log(1 - p)) :
invalid to change the storage mode of a factor
At this point I do not have an idea how to solve this and would like to
seek help.
I really really appreciate your input!!!
[[alternative HTML version deleted]]
David Winsemius
Alameda, CA, USA
David Winsemius
Alameda, CA, USA