Hi all,
I'm trying to do model reduction for logistic regression. I have 13
predictor (4 continuous variables and 9 binary variables). Using subject
matter knowledge, I selected 4 important variables. Regarding the rest 9
variables, I tried to perform data reduction by principal component
analysis (PCA). However, 8 of 9 variables were binary and only one
continuous. I transformed the data by transcan of rms package and did
PCA with princomp. PC1 explained only 20% of the variance. Still, I used
the PC1 as a predictor of the logistic model and obtained some results.
Then, I tried multiple correspondence analysis (MCA). The only one
continuous variable was age. I transformed "age" variable to "age_Q"
factor variable as the followings.
Principal inertias (eigenvalues):
dim value % cum% scree plot
1 0.009592 43.4 43.4 *************************
2 0.003983 18.0 61.4 **********
3 0.001047 4.7 66.1 **
4 0.000367 1.7 67.8
-------- -----
Total: 0.022111
The dimension 1 explained 43% of the variance. Then, I was wondering
which values I could use like PC1 in PCA. I explored in mjca1 and found
"rowcoord".
I used this "NewScore" as one of the predictors for the model instead of
original 9 variables.
The final logistic model obtained by use of MCA was similar to the one
obtained by use of PCA.
My questions are;
1. Is it O.K. to perform PCA for data consisting of 1 continuous
variable and 8 binary variables?
2. Is it O.K to perform transformation of age from continuous variable
to factor variable for MCA?
3. Is "mjca1$rowcoord[, 1]" the correct values as a predictor of
logistic regression model like PC1 of PCA?
I would appreciate your help in advance.
--
Kohkichi Hosoda