cluster summary score

2 messages · Huan Huang, Frank E Harrell Jr

Thu, Aug 8, 2002 5:23 AM #

Dear Prof. Harrell and R list,

I have done the variable clustering and summary scores. Thanks a lot for
your kind help.

But it hasn't solved the collinearity problem in my dataset. Afer the
clustering and transcan, there is still very strong collinearity between the
summary scores. The objective of my project is to find out the influential
variables. I believe any variable resuction is not appropriate when the
collinearity exists. I am thinking about the principal component regression
and variable reduction based on it (Rudolf J. Freund and William J. Wilson
(1998), P215).

Does anybody have suggestion on the variable resuction under this condition?
I will appreciate any kind imformation.

Best

Huan
----- Original Message -----
From: "Frank E Harrell Jr" <fharrell at virginia.edu>
To: "Huan Huang" <huang at stats.ox.ac.uk>
Sent: Sunday, August 04, 2002 7:56 PM
Subject: Re: cluster summary score

can I

3,5,7

with

(e.g.,

have

on

how

Design

the

this if

likelihood

ability,

outputs

Hmisc

cross-sectional

to

analoque to

vaguely

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.

Statistics

Sciences

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.

http://hesweb1.med.virginia.edu/biostat

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

Thu, Aug 8, 2002 6:20 AM #

This is confusing because if you do the variable clustering correctly, the cluster scores should be weakly correlated.  Check how you are doing the variable clustering and how you are interpreting measures of collinearity.
-Frank Harrell

On Thu, 8 Aug 2002 13:23:05 +0100

Huan Huang <huang at stats.ox.ac.uk> wrote:

Dear Prof. Harrell and R list,

I have done the variable clustering and summary scores. Thanks a lot for
your kind help.

But it hasn't solved the collinearity problem in my dataset. Afer the
clustering and transcan, there is still very strong collinearity between the
summary scores. The objective of my project is to find out the influential
variables. I believe any variable resuction is not appropriate when the
collinearity exists. I am thinking about the principal component regression
and variable reduction based on it (Rudolf J. Freund and William J. Wilson
(1998), P215).

Does anybody have suggestion on the variable resuction under this condition?
I will appreciate any kind imformation.

Best

Huan
----- Original Message -----
From: "Frank E Harrell Jr" <fharrell at virginia.edu>
To: "Huan Huang" <huang at stats.ox.ac.uk>
Sent: Sunday, August 04, 2002 7:56 PM
Subject: Re: cluster summary score

On Sun, 4 Aug 2002 19:48:22 +0100
Huan Huang <huang at stats.ox.ac.uk> wrote:

This was just done by

f <- lrm(y ~ all cluster summary scores)
fastbw(f, suitable stopping criteria)

Thank you very much for your kind reply. But I don't know how to get the
cluster summary score.

I did:
t <- transcan(x, transform = T)
t$transform

I got a new matrix, with the transformed value for each variable. How

can I

get the cluster summary scores?

You see the little pc1 function I defined in Hmisc?  I just do things like

p1 <- pc1(t$transform) or pct1(t$transform[,c(3,5,7)]) to use variables

3,5,7

Frank

Huan

Doing the fast backward stepdown is safer with cluster scores than

with

raw variables, especially if you use conservative stopping criteria

(e.g.,

large alpha).  I allowed "highly insignificant" cluster scores to be
dropped, and did not ever look at their component variables again.

Frank

Actually I am doing  my thesis project. My explanatory variables

have

serious collinearity. I have used the function transcan and varclus

on

the

variables and find out some clusters. I am trying to use the method
introduced in this section to drop some variables. I want to know

how

you

carry out the cluster summary scores.

Thanks a lot and looking forward to hearing from you.

Huan
----- Original Message -----
From: "Frank E Harrell Jr" <fharrell at virginia.edu>
To: <pmj at jciconsult.com>
Cc: <r-help at stat.math.ethz.ch>
Sent: Sunday, August 04, 2002 4:36 PM
Subject: Re: [R] Pseudo R^2 for logit - really naive question

The Nagelkerke R^2 is commonly used.   The lrm function in the

Design

library computes this for logistic regression.  The numerator is 1 -
exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is

the

total sample size.  Divide it by the maximum attainable value of

this if

the

model is perfect (which is a simple function of the -2 log

likelihood

with

an intercept-only model) to get Nagelkerke's R^2.  The numerator is

exactly

the ordinary R^2 in OLS, as LR = -n log(1-R^2) there.  For a more
interpretable index and one that measures purely discrimination

ability,

the

ROC area or "C index" which is essentially a Mann-Whitney statistic

based on

concordance probability is recommended.  The lrm function also

outputs

this

or you can get it from the somers2 or rcorr.cens functions in the

Hmisc

library.

Frank Harrell

On Sun, 4 Aug 2002 09:08:46 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:

I am using GLM to calculate logit models based on

cross-sectional

data.

am now down to the hard work of making the results intelligible

to

very

average readers.  Is there any way to calculate a psuedo

analoque to

the

R^2

in standard linear regression for use as a purely descriptive

statistic

of

goodness of fit? Most of the readers of my report will be

vaguely

familiar

and more comfortable with R^2 than with any other regression

diagnostics.

Paul M. Jacobson
Jacobson Consulting Inc.
80 Front Street East, Suite 720
Toronto, ON, M5E 1T4
Voice:  +1(416)868-1141
Farm: +1(519)463-6061/6224
Fax: +1(416)868-1131
E-mail: pmj at jciconsult.com
Web:  http://www.jciconsult.com/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.

-.-.-

r-help mailing list -- Read

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:

r-help-request at stat.math.ethz.ch

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.

_._


--
Frank E Harrell Jr              Prof. of Biostatistics &

Statistics

Div. of Biostatistics & Epidem. Dept. of Health Evaluation

Sciences

U. Virginia School of Medicine

http://hesweb1.med.virginia.edu/biostat

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.

-.-.-

r-help mailing list -- Read

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:

r-help-request at stat.math.ethz.ch

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.

_._


--
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine

http://hesweb1.med.virginia.edu/biostat


--
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._