Questions about lrm, validate, pentrace (Re: BMA, logistic regression, odds ratio, model reduction etc)

According to the advice, I tried rms package.
Just to make sure, I have data of 104 patients (x6.df), which consists 
of 5 explanatory variables and one binary outcome (poor/good) (previous 
model 2 strategy). The outcome consists of 25 poor results and 79 good 
results. Therefore, My events per variable (EPV) is only 5 (much less 
than the rule of thumb of 10).

My questions are about validate and pentrace in rms package.
I present some codes and results.
I appreciate anybody's help in advance.

 > x6.lrm <- lrm(outcome ~ stenosis+x1+x2+procedure+ClinicalScore, 
data=x6.df, x=T, y=T)

 > x6.lrm
...
Obs  104    LR chi2      29.24    R2       0.367    C       0.816
  negative 79    d.f.         5    g        1.633    Dxy     0.632
  positive 25    Pr(> chi2) <0.0001   gr    5.118    gamma   0.632
max |deriv| 1e-08                    gp    0.237    tau-a   0.233
                                      Brier   0.127

                Coef    S.E.   Wald Z Pr(>|Z|)
Intercept      -5.5328 2.6287 -2.10  0.0353
stenosis       -0.0150 0.0284 -0.53  0.5979
x1              3.0425 0.9100  3.34  0.0008
x2             -0.7534 0.4519 -1.67  0.0955
procedure       1.2085 0.5717  2.11  0.0345
ClinicalScore   0.3762 0.2287  1.65  0.0999

It seems not too bad. Next, validation by bootstrap ...

 > validate(x6.lrm, B=200, bw=F)
           index.orig training    test optimism index.corrected   n
Dxy           0.6324   0.6960  0.5870   0.1091          0.5233 200
R2            0.3668   0.4370  0.3154   0.1216          0.2453 200
Intercept     0.0000   0.0000 -0.2007   0.2007         -0.2007 200
Slope         1.0000   1.0000  0.7565   0.2435          0.7565 200
Emax          0.0000   0.0000  0.0999   0.0999          0.0999 200
D             0.2716   0.3368  0.2275   0.1093          0.1623 200
U            -0.0192  -0.0192  0.0369  -0.0561          0.0369 200
Q             0.2908   0.3560  0.1906   0.1654          0.1254 200
B             0.1272   0.1155  0.1384  -0.0229          0.1501 200
g             1.6328   2.0740  1.4647   0.6093          1.0235 200
gp            0.2367   0.2529  0.2189   0.0341          0.2026 200

The apparent Dxy is 0.63, and bias-corrected Dxy is 0.52. The maximum 
absolute error is estimated to be 0.099. The changes in slope and 
intercept are also more substantial. In all, there is evidence that I am 
somewhat overfitting the data, right?.

Furthermore, using step-down variable selection ...

 > validate(x6.lrm, B=200, bw=T)

		Backwards Step-down - Original Model

  Deleted        Chi-Sq d.f. P      Residual d.f. P      AIC
  stenosis       0.28   1    0.5979 0.28     1    0.5979 -1.72
  ClinicalScore  2.60   1    0.1068 2.88     2    0.2370 -1.12
  x2             2.86   1    0.0910 5.74     3    0.1252 -0.26

Approximate Estimates after Deleting Factors

              Coef   S.E. Wald Z         P
Intercept  -5.865 1.4136 -4.149 3.336e-05
x1          2.915 0.8685  3.357 7.889e-04
procedure   1.072 0.5590  1.918 5.508e-02

Factors in Final Model

[1] x1         procedure
           index.orig training    test optimism index.corrected   n
Dxy           0.5661   0.6755  0.5559   0.1196          0.4464 200
R2            0.2876   0.4085  0.2784   0.1301          0.1575 200
Intercept     0.0000   0.0000 -0.2459   0.2459         -0.2459 200
Slope         1.0000   1.0000  0.7300   0.2700          0.7300 200
Emax          0.0000   0.0000  0.1173   0.1173          0.1173 200
D             0.2038   0.3130  0.1970   0.1160          0.0877 200
U            -0.0192  -0.0192  0.0382  -0.0574          0.0382 200
Q             0.2230   0.3323  0.1589   0.1734          0.0496 200
B             0.1441   0.1192  0.1452  -0.0261          0.1702 200
g             1.2628   1.9524  1.3222   0.6302          0.6326 199
gp            0.2041   0.2430  0.2043   0.0387          0.1654 199

If I select only two variables (x1 and procedure), bias-corrected Dxy 
goes down to 0.45.

[Question 1]
I have EPV problem. Even so, should I keep the full model (5-variable 
model)? or can I use the 2-variable (x1 and procedure) model which the 
validate() with step-down provides?

[Question 2]
If I use 2-variable model, should I do
x2.lrm <- lrm(postopDWI_HI ~ T1+procedure2, data=x6.df, x=T, y=T)?
or keep the value showed above by validate function?

Next, shrinkage ...

 > pentrace(x6.lrm, seq(0, 5.0, by=0.05))
Best penalty:
penalty         df
    3.05   4.015378

The best penalty is 3.05. So, I update it with this penalty to obtain 
the corresponding penalized model:

 > x6.lrm.pen <- update(x6.lrm, penalty=3.05, x=T, y=T)
 > x6.lrm.pen
.....
Penalty factors

  simple nonlinear interaction nonlinear.interaction
    3.05      3.05        3.05                  3.05
Final penalty on -2 log L
      [,1]
[1,]  3.8

Obs     104    LR chi2      28.18    R2       0.313    C       0.818
  negative    79    d.f.     4.015    g        1.264    Dxy     0.635
  positive    25   Pr(> chi2) <0.0001 gr       3.538    gamma   0.637
max |deriv| 3e-05                    gp       0.201    tau-a   0.234
                                      Brier    0.129

                Coef    S.E.   Wald Z Pr(>|Z|) Penalty Scale
Intercept      -4.7246 2.2429 -2.11  0.0352    0.0000
stenosis       -0.0105 0.0240 -0.44  0.6621   17.8021
x1              2.3605 0.7254  3.25  0.0011    0.6054
x2             -0.5385 0.3653 -1.47  0.1404    1.2851
procedure       0.9247 0.4844  1.91  0.0563    0.8576
ClinicalScore   0.3046 0.1874  1.63  0.1041    2.4779

Arrange the coefficients of the two models side by side, and also list 
the difference between the two:

 > cbind(coef(x6.lrm), coef(x6.lrm.pen), abs(coef(x6.lrm)-coef(x6.lrm.pen)))
                       [,1]        [,2]        [,3]
Intercept      -5.53281808 -4.72464766 0.808170417
stenosis       -0.01496757 -0.01050797 0.004459599
x1              3.04248257  2.36051833 0.681964238
x2             -0.75335619 -0.53854750 0.214808685
procedure       1.20847252  0.92474708 0.283725441
ClinicalScore   0.37623189  0.30457557 0.071656322

[Question 3]
Is this penalized model the one I should present for my colleagues?
I still have EPV problem. Or is EPV problem O.K. if I use penalization?

I am still wondering about what I can do to avoid EPV problem. 
Collecting new data would be a long-time and huge work...

Questions about lrm, validate, pentrace (Re: BMA, logistic regression, odds ratio, model reduction etc)

Thread (10 messages)