I am using GLM to calculate logit models based on cross-sectional data. I am now down to the hard work of making the results intelligible to very average readers. Is there any way to calculate a psuedo analoque to the R^2 in standard linear regression for use as a purely descriptive statistic of goodness of fit? Most of the readers of my report will be vaguely familiar and more comfortable with R^2 than with any other regression diagnostics. Paul M. Jacobson Jacobson Consulting Inc. 80 Front Street East, Suite 720 Toronto, ON, M5E 1T4 Voice: +1(416)868-1141 Farm: +1(519)463-6061/6224 Fax: +1(416)868-1131 E-mail: pmj at jciconsult.com Web: http://www.jciconsult.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Pseudo R^2 for logit - really naive question
7 messages · Paul M. Jacobson, Chris Lawrence, Huan Huang +1 more
The Nagelkerke R^2 is commonly used. The lrm function in the Design library computes this for logistic regression. The numerator is 1 - exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is the total sample size. Divide it by the maximum attainable value of this if the model is perfect (which is a simple function of the -2 log likelihood with an intercept-only model) to get Nagelkerke's R^2. The numerator is exactly the ordinary R^2 in OLS, as LR = -n log(1-R^2) there. For a more interpretable index and one that measures purely discrimination ability, the ROC area or "C index" which is essentially a Mann-Whitney statistic based on concordance probability is recommended. The lrm function also outputs this or you can get it from the somers2 or rcorr.cens functions in the Hmisc library. Frank Harrell On Sun, 4 Aug 2002 09:08:46 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:
I am using GLM to calculate logit models based on cross-sectional data. I am now down to the hard work of making the results intelligible to very average readers. Is there any way to calculate a psuedo analoque to the R^2 in standard linear regression for use as a purely descriptive statistic of goodness of fit? Most of the readers of my report will be vaguely familiar and more comfortable with R^2 than with any other regression diagnostics. Paul M. Jacobson Jacobson Consulting Inc. 80 Front Street East, Suite 720 Toronto, ON, M5E 1T4 Voice: +1(416)868-1141 Farm: +1(519)463-6061/6224 Fax: +1(416)868-1131 E-mail: pmj at jciconsult.com Web: http://www.jciconsult.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Aug 04, Paul M. Jacobson wrote:
I am using GLM to calculate logit models based on cross-sectional data. I am now down to the hard work of making the results intelligible to very average readers. Is there any way to calculate a psuedo analoque to the R^2 in standard linear regression for use as a purely descriptive statistic of goodness of fit? Most of the readers of my report will be vaguely familiar and more comfortable with R^2 than with any other regression diagnostics.
In fact, there are several "R^2-like" measures for logit and probit models (not surprisingly, called "pseudo-R^2"). An overview is in: "Pseudo-R Measures for Some Common Limited Dependent Variable Models" http://citeseer.nj.nec.com/veall96pseudor.html The Aldrich-Nelson measure appears to be the most widely used. You may also want to consider Herron's (1999) "Expected Percent Correctly Predicted" and related measures, described in Political Analysis 8(1): http://web.polmeth.ufl.edu/pa/herron.pdf; even traditional PCP/PRE measures tend to be quite informative (perhaps even more useful than Pseudo-R^2). Chris
Chris Lawrence <cnlawren at olemiss.edu> - http://www.lordsutch.com/chris/ Instructor and Ph.D. Candidate, Political Science, Univ. of Mississippi 208 Deupree Hall - 662-915-5765 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear list My data frame has serious collinearity problem. I want to try the Incomplete Principal Component Regression, introduced by "Regression Analysis" by Rudolf j. Freund and William J. Wilson (1998). I wonder if I can find a function in R or S-plus to do it. Thanks a lot! Huan -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Sun, 4 Aug 2002 11:23:34 -0500
Chris Lawrence <cnlawren at phy.olemiss.edu> wrote:
On Aug 04, Paul M. Jacobson wrote:
I am using GLM to calculate logit models based on cross-sectional data. I am now down to the hard work of making the results intelligible to very average readers. Is there any way to calculate a psuedo analoque to the R^2 in standard linear regression for use as a purely descriptive statistic of goodness of fit? Most of the readers of my report will be vaguely familiar and more comfortable with R^2 than with any other regression diagnostics.
In fact, there are several "R^2-like" measures for logit and probit models (not surprisingly, called "pseudo-R^2"). An overview is in: "Pseudo-R Measures for Some Common Limited Dependent Variable Models" http://citeseer.nj.nec.com/veall96pseudor.html
Chris - That's a really nice paper. I hadn't realized that the index in Nagelkerke's 1991 Biometrika paper was first proposed by Cragg and Uhler in 1970.
The Aldrich-Nelson measure appears to be the most widely used. You may also want to consider Herron's (1999) "Expected Percent Correctly Predicted" and related measures, described in Political Analysis 8(1): http://web.polmeth.ufl.edu/pa/herron.pdf; even traditional PCP/PRE measures tend to be quite informative (perhaps even more useful than Pseudo-R^2).
Percent correctly predicted has a host of problems, only some of which were pointed out in the above article. This is an improper scoring rule (is not optimized when predicted probabilities are correct and its value can decrease when an important regressor is added to a model; see also http://hesweb1.med.virginia.edu/biostat/presentations/probclin.pdf). The paper also did not reference the large literature on scoring rules for dichotomous outcomes (see e.g. work by Habbema and Hilden in the medical diagnostic literature and many papers on the Brier score and its decompositions). Frank Harrell
Chris -- Chris Lawrence <cnlawren at olemiss.edu> - http://www.lordsutch.com/chris/ Instructor and Ph.D. Candidate, Political Science, Univ. of Mississippi 208 Deupree Hall - 662-915-5765 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
2 days later
I could find the function in the function search option on CRAN. However, how to find the actual library remains a mystery to me. I could not see it under the list of available bundles and packages on the CRAN site http://lib.stat.cmu.edu/R/CRAN/sources.html I am very much a newbie to the R world and need some more direct help. -----Original Message----- From: owner-r-help at stat.math.ethz.ch [mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E Harrell Jr Sent: August 4, 2002 11:36 AM To: pmj at jciconsult.com Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Pseudo R^2 for logit - really naive question The Nagelkerke R^2 is commonly used. The lrm function in the Design library computes this for logistic regression. The numerator is 1 - exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is the total sample size. Divide it by the maximum attainable value of this if the model is perfect (which is a simple function of the -2 log likelihood with an intercept-only model) to get Nagelkerke's R^2. The numerator is exactly the ordinary R^2 in OLS, as LR = -n log(1-R^2) there. For a more interpretable index and one that measures purely discrimination ability, the ROC area or "C index" which is essentially a Mann-Whitney statistic based on concordance probability is recommended. The lrm function also outputs this or you can get it from the somers2 or rcorr.cens functions in the Hmisc library. Frank Harrell On Sun, 4 Aug 2002 09:08:46 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:
I am using GLM to calculate logit models based on cross-sectional data. I am now down to the hard work of making the results intelligible to very average readers. Is there any way to calculate a psuedo analoque to the
R^2
in standard linear regression for use as a purely descriptive statistic of goodness of fit? Most of the readers of my report will be vaguely familiar and more comfortable with R^2 than with any other regression diagnostics. Paul M. Jacobson Jacobson Consulting Inc. 80 Front Street East, Suite 720 Toronto, ON, M5E 1T4 Voice: +1(416)868-1141 Farm: +1(519)463-6061/6224 Fax: +1(416)868-1131 E-mail: pmj at jciconsult.com Web: http://www.jciconsult.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Design and Hmisc libraries are not yet on CRAN. See http://hesweb1.med.virginia.edu/biostat/s for links that will allow you to download them. -Frank Harrell On Wed, 7 Aug 2002 08:03:56 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:
I could find the function in the function search option on CRAN. However, how to find the actual library remains a mystery to me. I could not see it under the list of available bundles and packages on the CRAN site http://lib.stat.cmu.edu/R/CRAN/sources.html I am very much a newbie to the R world and need some more direct help. -----Original Message----- From: owner-r-help at stat.math.ethz.ch [mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E Harrell Jr Sent: August 4, 2002 11:36 AM To: pmj at jciconsult.com Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Pseudo R^2 for logit - really naive question The Nagelkerke R^2 is commonly used. The lrm function in the Design library computes this for logistic regression. The numerator is 1 - exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is the total sample size. Divide it by the maximum attainable value of this if the model is perfect (which is a simple function of the -2 log likelihood with an intercept-only model) to get Nagelkerke's R^2. The numerator is exactly the ordinary R^2 in OLS, as LR = -n log(1-R^2) there. For a more interpretable index and one that measures purely discrimination ability, the ROC area or "C index" which is essentially a Mann-Whitney statistic based on concordance probability is recommended. The lrm function also outputs this or you can get it from the somers2 or rcorr.cens functions in the Hmisc library. Frank Harrell On Sun, 4 Aug 2002 09:08:46 -0400 "Paul M. Jacobson" <pmj at jciconsult.com> wrote:
I am using GLM to calculate logit models based on cross-sectional data. I am now down to the hard work of making the results intelligible to very average readers. Is there any way to calculate a psuedo analoque to the
R^2
in standard linear regression for use as a purely descriptive statistic of goodness of fit? Most of the readers of my report will be vaguely familiar and more comfortable with R^2 than with any other regression diagnostics. Paul M. Jacobson Jacobson Consulting Inc. 80 Front Street East, Suite 720 Toronto, ON, M5E 1T4 Voice: +1(416)868-1141 Farm: +1(519)463-6061/6224 Fax: +1(416)868-1131 E-mail: pmj at jciconsult.com Web: http://www.jciconsult.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
r-help mailing list -- Read
Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._
Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._