Dear Prof. Harrell and R list,
I have done the variable clustering and summary scores. Thanks a lot for
your kind help.
But it hasn't solved the collinearity problem in my dataset. Afer the
clustering and transcan, there is still very strong collinearity between the
summary scores. The objective of my project is to find out the influential
variables. I believe any variable resuction is not appropriate when the
collinearity exists. I am thinking about the principal component regression
and variable reduction based on it (Rudolf J. Freund and William J. Wilson
(1998), P215).
Does anybody have suggestion on the variable resuction under this condition?
I will appreciate any kind imformation.
Best
Huan
----- Original Message -----
From: "Frank E Harrell Jr" <fharrell at virginia.edu>
To: "Huan Huang" <huang at stats.ox.ac.uk>
Sent: Sunday, August 04, 2002 7:56 PM
Subject: Re: cluster summary score
On Sun, 4 Aug 2002 19:48:22 +0100
Huan Huang <huang at stats.ox.ac.uk> wrote:
This was just done by
f <- lrm(y ~ all cluster summary scores)
fastbw(f, suitable stopping criteria)
Thank you very much for your kind reply. But I don't know how to get the
cluster summary score.
I did:
t <- transcan(x, transform = T)
t$transform
I got a new matrix, with the transformed value for each variable. How
get the cluster summary scores?
You see the little pc1 function I defined in Hmisc? I just do things like
p1 <- pc1(t$transform) or pct1(t$transform[,c(3,5,7)]) to use variables
Doing the fast backward stepdown is safer with cluster scores than
raw variables, especially if you use conservative stopping criteria
large alpha). I allowed "highly insignificant" cluster scores to be
dropped, and did not ever look at their component variables again.
Actually I am doing my thesis project. My explanatory variables
serious collinearity. I have used the function transcan and varclus
variables and find out some clusters. I am trying to use the method
introduced in this section to drop some variables. I want to know
carry out the cluster summary scores.
Thanks a lot and looking forward to hearing from you.
Huan
----- Original Message -----
From: "Frank E Harrell Jr" <fharrell at virginia.edu>
To: <pmj at jciconsult.com>
Cc: <r-help at stat.math.ethz.ch>
Sent: Sunday, August 04, 2002 4:36 PM
Subject: Re: [R] Pseudo R^2 for logit - really naive question
The Nagelkerke R^2 is commonly used. The lrm function in the
library computes this for logistic regression. The numerator is 1 -
exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is
total sample size. Divide it by the maximum attainable value of
model is perfect (which is a simple function of the -2 log
an intercept-only model) to get Nagelkerke's R^2. The numerator is
the ordinary R^2 in OLS, as LR = -n log(1-R^2) there. For a more
interpretable index and one that measures purely discrimination
ROC area or "C index" which is essentially a Mann-Whitney statistic
concordance probability is recommended. The lrm function also
or you can get it from the somers2 or rcorr.cens functions in the
Frank Harrell
On Sun, 4 Aug 2002 09:08:46 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:
I am using GLM to calculate logit models based on
am now down to the hard work of making the results intelligible
average readers. Is there any way to calculate a psuedo
in standard linear regression for use as a purely descriptive
goodness of fit? Most of the readers of my report will be
and more comfortable with R^2 than with any other regression
Paul M. Jacobson
Jacobson Consulting Inc.
80 Front Street East, Suite 720
Toronto, ON, M5E 1T4
Voice: +1(416)868-1141
Farm: +1(519)463-6061/6224
Fax: +1(416)868-1131
E-mail: pmj at jciconsult.com
Web: http://www.jciconsult.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
r-help mailing list -- Read
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
--
Frank E Harrell Jr Prof. of Biostatistics &
Div. of Biostatistics & Epidem. Dept. of Health Evaluation
U. Virginia School of Medicine
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
r-help mailing list -- Read
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat