cluster summary score
This is confusing because if you do the variable clustering correctly, the cluster scores should be weakly correlated. Check how you are doing the variable clustering and how you are interpreting measures of collinearity. -Frank Harrell On Thu, 8 Aug 2002 13:23:05 +0100
Huan Huang <huang at stats.ox.ac.uk> wrote:
Dear Prof. Harrell and R list, I have done the variable clustering and summary scores. Thanks a lot for your kind help. But it hasn't solved the collinearity problem in my dataset. Afer the clustering and transcan, there is still very strong collinearity between the summary scores. The objective of my project is to find out the influential variables. I believe any variable resuction is not appropriate when the collinearity exists. I am thinking about the principal component regression and variable reduction based on it (Rudolf J. Freund and William J. Wilson (1998), P215). Does anybody have suggestion on the variable resuction under this condition? I will appreciate any kind imformation. Best Huan ----- Original Message ----- From: "Frank E Harrell Jr" <fharrell at virginia.edu> To: "Huan Huang" <huang at stats.ox.ac.uk> Sent: Sunday, August 04, 2002 7:56 PM Subject: Re: cluster summary score
On Sun, 4 Aug 2002 19:48:22 +0100 Huan Huang <huang at stats.ox.ac.uk> wrote:
This was just done by f <- lrm(y ~ all cluster summary scores) fastbw(f, suitable stopping criteria)
Thank you very much for your kind reply. But I don't know how to get the cluster summary score. I did: t <- transcan(x, transform = T) t$transform I got a new matrix, with the transformed value for each variable. How
can I
get the cluster summary scores?
You see the little pc1 function I defined in Hmisc? I just do things like p1 <- pc1(t$transform) or pct1(t$transform[,c(3,5,7)]) to use variables
3,5,7
Frank
Huan
Doing the fast backward stepdown is safer with cluster scores than
with
raw variables, especially if you use conservative stopping criteria
(e.g.,
large alpha). I allowed "highly insignificant" cluster scores to be dropped, and did not ever look at their component variables again.
Frank
Actually I am doing my thesis project. My explanatory variables
have
serious collinearity. I have used the function transcan and varclus
on
the
variables and find out some clusters. I am trying to use the method introduced in this section to drop some variables. I want to know
how
you
carry out the cluster summary scores. Thanks a lot and looking forward to hearing from you. Huan ----- Original Message ----- From: "Frank E Harrell Jr" <fharrell at virginia.edu> To: <pmj at jciconsult.com> Cc: <r-help at stat.math.ethz.ch> Sent: Sunday, August 04, 2002 4:36 PM Subject: Re: [R] Pseudo R^2 for logit - really naive question
The Nagelkerke R^2 is commonly used. The lrm function in the
Design
library computes this for logistic regression. The numerator is 1 - exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is
the
total sample size. Divide it by the maximum attainable value of
this if
the
model is perfect (which is a simple function of the -2 log
likelihood
with
an intercept-only model) to get Nagelkerke's R^2. The numerator is
exactly
the ordinary R^2 in OLS, as LR = -n log(1-R^2) there. For a more interpretable index and one that measures purely discrimination
ability,
the
ROC area or "C index" which is essentially a Mann-Whitney statistic
based on
concordance probability is recommended. The lrm function also
outputs
this
or you can get it from the somers2 or rcorr.cens functions in the
Hmisc
library.
Frank Harrell On Sun, 4 Aug 2002 09:08:46 -0400 "Paul M. Jacobson" <pmj at jciconsult.com> wrote:
I am using GLM to calculate logit models based on
cross-sectional
data.
I
am now down to the hard work of making the results intelligible
to
very
average readers. Is there any way to calculate a psuedo
analoque to
the
R^2
in standard linear regression for use as a purely descriptive
statistic
of
goodness of fit? Most of the readers of my report will be
vaguely
familiar
and more comfortable with R^2 than with any other regression
diagnostics.
Paul M. Jacobson Jacobson Consulting Inc. 80 Front Street East, Suite 720 Toronto, ON, M5E 1T4 Voice: +1(416)868-1141 Farm: +1(519)463-6061/6224 Fax: +1(416)868-1131 E-mail: pmj at jciconsult.com Web: http://www.jciconsult.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
r-help mailing list -- Read
Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To:
r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-- Frank E Harrell Jr Prof. of Biostatistics &
Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation
Sciences
U. Virginia School of Medicine
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
r-help mailing list -- Read
Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To:
r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine
-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._