Pseudo R^2 for logit - really naive question - R-help

Sun, Aug 4, 2002 6:08 AM #

I am using GLM to calculate logit models based on cross-sectional data.  I
am now down to the hard work of making the results intelligible to very
average readers.  Is there any way to calculate a psuedo analoque to the R^2
in standard linear regression for use as a purely descriptive statistic of
goodness of fit? Most of the readers of my report will be vaguely familiar
and more comfortable with R^2 than with any other regression diagnostics.

Paul M. Jacobson
Jacobson Consulting Inc.
80 Front Street East, Suite 720
Toronto, ON, M5E 1T4
Voice:  +1(416)868-1141
Farm: +1(519)463-6061/6224
Fax: +1(416)868-1131
E-mail: pmj at jciconsult.com
Web:  http://www.jciconsult.com/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

Sun, Aug 4, 2002 8:36 AM #

The Nagelkerke R^2 is commonly used.   The lrm function in the Design library computes this for logistic regression.  The numerator is 1 - exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is the total sample size.  Divide it by the maximum attainable value of this if the model is perfect (which is a simple function of the -2 log likelihood with an intercept-only model) to get Nagelkerke's R^2.  The numerator is exactly the ordinary R^2 in OLS, as LR = -n log(1-R^2) there.  For a more interpretable index and one that measures purely discrimination ability, the ROC area or "C index" which is essentially a Mann-Whitney statistic based on concordance probability is recommended.  The lrm function also outputs this or you can get it from the somers2 or rcorr.cens functions in the Hmisc library.

Frank Harrell

On Sun, 4 Aug 2002 09:08:46 -0400

"Paul M. Jacobson" <pmj at jciconsult.com> wrote:

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Chris Lawrence

Sun, Aug 4, 2002 9:23 AM #

On Aug 04, Paul M. Jacobson wrote:

In fact, there are several "R^2-like" measures for logit and probit
models (not surprisingly, called "pseudo-R^2").  An overview is in:

"Pseudo-R Measures for Some Common Limited Dependent Variable Models"
http://citeseer.nj.nec.com/veall96pseudor.html

The Aldrich-Nelson measure appears to be the most widely used.

You may also want to consider Herron's (1999) "Expected Percent
Correctly Predicted" and related measures, described in Political
Analysis 8(1): http://web.polmeth.ufl.edu/pa/herron.pdf; even
traditional PCP/PRE measures tend to be quite informative (perhaps
even more useful than Pseudo-R^2).


Chris

Chris Lawrence <cnlawren at olemiss.edu> - http://www.lordsutch.com/chris/

Instructor and Ph.D. Candidate, Political Science, Univ. of Mississippi
208 Deupree Hall - 662-915-5765
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Huan Huang

Sun, Aug 4, 2002 9:38 AM #

Dear list

My data frame has serious collinearity problem. I want to try the Incomplete
Principal Component Regression, introduced by "Regression Analysis" by
Rudolf j. Freund and William J. Wilson (1998). I wonder if I can find a
function in R or S-plus to do it.

Thanks a lot!

Huan

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

Sun, Aug 4, 2002 10:47 AM #

On Sun, 4 Aug 2002 11:23:34 -0500

Chris Lawrence <cnlawren at phy.olemiss.edu> wrote:

Chris - That's a really nice paper.  I hadn't realized that the index in Nagelkerke's 1991 Biometrika paper was first proposed by Cragg and Uhler in 1970.

Percent correctly predicted has a host of problems, only some of which were pointed out in the above article.  This is an improper scoring rule (is not optimized when predicted probabilities are correct and its value can decrease when an important regressor is added to a model; see also http://hesweb1.med.virginia.edu/biostat/presentations/probclin.pdf).  The paper also did not reference the large literature on scoring rules for dichotomous outcomes (see e.g. work by Habbema and Hilden in the medical diagnostic literature and many papers on the Brier score and its decompositions).

Frank Harrell

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Paul M. Jacobson

Wed, Aug 7, 2002 5:03 AM #

I could find the function in the function search option on CRAN.  However,
how to find the actual library remains a mystery to me.  I could not see it
under the list of available bundles and packages on the CRAN site
http://lib.stat.cmu.edu/R/CRAN/sources.html
I am very much a newbie to the R world and need some more direct help.

-----Original Message-----
From: owner-r-help at stat.math.ethz.ch
[mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E Harrell Jr
Sent: August 4, 2002 11:36 AM
To: pmj at jciconsult.com
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Pseudo R^2 for logit - really naive question


The Nagelkerke R^2 is commonly used.   The lrm function in the Design
library computes this for logistic regression.  The numerator is 1 -
exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is the
total sample size.  Divide it by the maximum attainable value of this if the
model is perfect (which is a simple function of the -2 log likelihood with
an intercept-only model) to get Nagelkerke's R^2.  The numerator is exactly
the ordinary R^2 in OLS, as LR = -n log(1-R^2) there.  For a more
interpretable index and one that measures purely discrimination ability, the
ROC area or "C index" which is essentially a Mann-Whitney statistic based on
concordance probability is recommended.  The lrm function also outputs this
or you can get it from the somers2 or rcorr.cens functions in the Hmisc
library.

Frank Harrell

On Sun, 4 Aug 2002 09:08:46 -0400

"Paul M. Jacobson" <pmj at jciconsult.com> wrote:

R^2

-.-.-

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._


--
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

Wed, Aug 7, 2002 5:21 AM #

Design and Hmisc libraries are not yet on CRAN.  See http://hesweb1.med.virginia.edu/biostat/s for links that will allow you to download them.  -Frank Harrell

On Wed, 7 Aug 2002 08:03:56 -0400

"Paul M. Jacobson" <pmj at jciconsult.com> wrote:

I could find the function in the function search option on CRAN.  However,
how to find the actual library remains a mystery to me.  I could not see it
under the list of available bundles and packages on the CRAN site
http://lib.stat.cmu.edu/R/CRAN/sources.html
I am very much a newbie to the R world and need some more direct help.

-----Original Message-----
From: owner-r-help at stat.math.ethz.ch
[mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E Harrell Jr
Sent: August 4, 2002 11:36 AM
To: pmj at jciconsult.com
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Pseudo R^2 for logit - really naive question


The Nagelkerke R^2 is commonly used.   The lrm function in the Design
library computes this for logistic regression.  The numerator is 1 -
exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is the
total sample size.  Divide it by the maximum attainable value of this if the
model is perfect (which is a simple function of the -2 log likelihood with
an intercept-only model) to get Nagelkerke's R^2.  The numerator is exactly
the ordinary R^2 in OLS, as LR = -n log(1-R^2) there.  For a more
interpretable index and one that measures purely discrimination ability, the
ROC area or "C index" which is essentially a Mann-Whitney statistic based on
concordance probability is recommended.  The lrm function also outputs this
or you can get it from the somers2 or rcorr.cens functions in the Hmisc
library.

Frank Harrell

On Sun, 4 Aug 2002 09:08:46 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:

I am using GLM to calculate logit models based on cross-sectional data.  I
am now down to the hard work of making the results intelligible to very
average readers.  Is there any way to calculate a psuedo analoque to the

R^2

in standard linear regression for use as a purely descriptive statistic of
goodness of fit? Most of the readers of my report will be vaguely familiar
and more comfortable with R^2 than with any other regression diagnostics.

Paul M. Jacobson
Jacobson Consulting Inc.
80 Front Street East, Suite 720
Toronto, ON, M5E 1T4
Voice:  +1(416)868-1141
Farm: +1(519)463-6061/6224
Fax: +1(416)868-1131
E-mail: pmj at jciconsult.com
Web:  http://www.jciconsult.com/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.

-.-.-

r-help mailing list -- Read

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._


--
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._