Hosmer- Lemeshow test
saggak wrote:
Dear R - help, I am working on the Credit scorecard model. I am using the Logistic regression to arrive at the regression coefficients model. I want to use the Hosmer - Lemeshow test . In order to understand the use of R - language, I had referred the following URL ? ? ? ? ? http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt The related data 'diseaseoutbreak' is available at the following URL ? ? ? ? ? ? http://www.stat.sc.edu/~hitchcock/diseaseoutbreakdata.txt The R code as mentioned therein is #### # A function to do the Hosmer-Lemeshow test in R. # R Function is due to Peter D. M. Macdonald, McMaster University. # hosmerlem <- function (y, yhat, g = 10) { cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), include.lowest = T) obs <- xtabs(cbind(1 - y, y) ~ cutyhat) expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat) chisq <- sum((obs - expect)^2/expect) P <- 1 - pchisq(chisq, g - 2) c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P) } # ###### # Doing the Hosmer-Lemeshow test # (after copying the above function into R): hosmerlem(disease, fitted(disease.logit)) However when I ran these commands / functions in R, I got following errors Error in model.frame.default(formula = cbind(1 - y, y) ~ cutyhat) : ? invalid type (list) for variable 'cbind(1 - y, y)' Can anyone please guide me as to how to run Hosmer- Lemeshow test, as also how to find out the other usual logistic regression related "Log - likelihood, AIC, Pseudo R etc"? Thanking you all in advance Saggak
That test is too dependent on cutpoints and does not have adequate power
. I recommend replacing it with
@ARTICLE{hos97com,
author = {Hosmer, D. W. and Hosmer, T. and {le Cessie}, S. and
Lemeshow, S.},
year = 1997,
title = {A comparison of goodness-of-fit tests for the logistic
regression
model},
journal = Statistics in Medicine,
volume = 16,
pages = {965-980},
annote = {goodness-of-fit for binary logistic model;difficulty with
Hosmer-Lemeshow statistic being dependent on how groups are
defined;sum of squares test;cumulative sum test;invalidity
of naive
test based on deviance;goodness-of-link function;simulation
setup}
which is implemented in the residuals.lrm function in the Design package.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University