Skip to content
Prev 269047 / 398503 Next

How to use PC1 of PCA and dim1 of MCA as a predictor in logistic regression model for data reduction

Dear Mark,

Thank you very much for your kind advice.

Actually, I already performed penalized logistic regression by pentrace 
and lrm in package "rms".

The reason why I wanted to reduce dimensionality of those 9 variables 
was that these variables were not so important according to the subject 
matter knowledge and that I wanted to avoid events per variable problem.

Your answer about dudi.mix$l1 helped me a lot.
I finally was able to perform penalized logistic regression for data 
consisting of 4 important variables and x18.dudi.mix$l1[, 1]. Thanks a 
lot again.

One more question, I investigated homals package too. I found it has 
"ndim" option.

mydata is followings;

 > head(x10homals.df)
   age sex      symptom       HT       DM      IHD  smoking 
hyperlipidemia   Statin Response
1  62   M asymptomatic positive negative negative positive 
positive positive     negative
2  82   M  symptomatic positive negative negative negative 
positive positive     negative
3  64   M asymptomatic negative positive negative negative 
positive positive     negative
4  55   M  symptomatic positive positive positive negative 
positive positive     negative
5  67   M  symptomatic positive negative negative negative 
negative positive     negative
6  79   M asymptomatic positive positive negative negative 
positive positive     negative

age is continuous variable, and Response should not be active for 
computation, so, ...

x10.homals4 <- homals(x10homals.df, active = c(rep(TRUE, 9), FALSE), 
level=c("numerical", rep("nominal", 9)), ndim=4)

I did it with ndim from 2 to 9, compared Classification rate of Response 
by predict(x10.homals).

 > p.x10.homals4

Classification rate:
          Variable Cl. Rate %Cl. Rate
1             age   0.4712     47.12
2             sex   0.9808     98.08
3         symptom   0.8269     82.69
4              HT   0.9135     91.35
5              DM   0.8558     85.58
6             IHD   0.8750     87.50
7         smoking   0.9423     94.23
8  hyperlipidemia   0.9519     95.19
9          Statin   0.8942     89.42
10       Response   0.6154     61.54

This is the best for classification of Response, so, I selected ndim=4. 
Then, I found objscores.

 > head(x10.homals4$objscores)
             D1           D2           D3          D4
1 -0.002395321 -0.034032230 -0.008140378  0.02369123
2  0.036788626 -0.010308707  0.005725984 -0.02751958
3  0.014363031  0.049594466 -0.025627467  0.06254055
4  0.083092285  0.065147519  0.045903394 -0.03751551
5 -0.013692504  0.005106661 -0.007656776 -0.04107009
6  0.002320747  0.024375393 -0.017785415 -0.01752556

I used x10.homals4$objscores[, 1] as a predictor for logistic regression 
as in the same way as PC1 in PCA.

Am I going the right way?

Thanks a lot for your help in advance.

Best regards

--
Kohkichi Hosoda
(11/08/19 4:21), Mark Difford wrote: