Score Test Function

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110612/f8212a09/attachment.pl>
The score test looks at the effect of adding extra columns to the model matrix.  The function glm.scoretest takes the fitted model object as the first argument and the extra column, or columns, as the second argument.  Your x2 argument has length only 3.  Is this really what you want?  I would have expected you need to specify a vector of length nrow(DF), [as in the help information for glm.scoretest itself].

Bill Venables.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Tyler Rinker
Sent: Sunday, 12 June 2011 2:46 PM
To: r-help at r-project.org
Subject: [R] Score Test Function

Greeting R Community,

I'm trying to learn Logistic Regression on my own and am using An Introduction to Logistic Regression Analysis and Reporting (Peng, C., Lee, K., & Ingersoll, G. ,2002). This article uses a Score Test Stat as a measure of overall fit for a logistic regression model.  The author calculates this statistic using SAS.  I am looking for an [R] function that can compute this stat and a p=value (please don't critique the stat itself).  As far as I can tell glm.scoretest() from library(statmod) is the only function for this but it does not give a pvalue and appears to not correspond with the author's example.  Some chat on the discussion board a while back indicated that this was a well known test Click here for that thread but difficult to calculate in [R].  I'm wondering if there is a way to do this fairly easily in [R] at this time?

I am working with the data set from the study and am looking to get close to the 9.5177 score test stat and .0086 p-value with 2 degrees freedom.
Below is the code for the data set and some descriptives so you can see the data set is the same as  from the study.  I highlighted my attempt to get a score test statistic and am curious if this is it (minus the p-value).

#BEGINNING OF CODE
id<-factor(1:189)
gender<-factor(c("Boy","Boy","Girl","Girl","Girl","Boy","Girl","Boy","Girl","Girl","Boy","Boy","Boy","Boy","Girl","Girl","Boy","Girl","Boy","Girl","Boy","Girl","Girl",
"Boy","Boy","Girl","Girl","Girl","Boy","Boy","Boy","Girl","Boy","Girl","Boy","Girl","Girl","Girl","Girl","Girl","Boy","Girl","Boy","Girl","Girl","Girl",
"Girl","Boy","Girl","Boy","Girl","Boy","Girl","Girl","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Girl","Boy","Boy","Boy","Boy","Girl","Boy",
"Girl","Boy","Boy","Boy","Girl","Boy","Girl","Girl","Boy","Girl","Girl","Girl","Boy","Boy","Boy","Boy","Boy","Girl","Girl","Girl","Girl","Boy","Girl",
"Girl","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Boy","Girl","Boy","Boy","Girl","Girl","Girl","Boy","Girl","Boy","Girl","Girl","Girl","Boy",
"Girl","Boy","Girl","Boy","Girl","Boy","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Boy","Girl","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Girl",
"Girl","Girl","Boy","Boy","Girl","Girl","Boy","Girl","Boy","Boy","Boy","Girl","Girl","Girl","Girl","Boy","Boy","Girl","Boy","Boy","Girl","Boy","Boy","Boy",
"Boy","Girl","Boy","Boy","Girl","Girl","Boy","Boy","Boy","Boy","Boy","Girl","Girl","Girl","Girl","Boy","Boy","Boy","Girl","Boy","Girl","Boy","Boy","Boy","Girl"))
reading.score<-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0,
60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0,
80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5,
57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5,
62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0,
65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5,
75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0,
65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0,
52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0)
reading.recommendation<-as.factor(c(rep("no",130),rep("yes",59)))
DF<-data.frame(id,gender,reading.score,reading.recommendation)
head(DF)
#=================================================================================
#      DESCRIPTIVES
#=================================================================================
table(DF[2:4])  #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING RECOMENDATIONS
table(DF[c(2)])  #TABLE OF GENDER
print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER BREAKDOWN
table(DF[c(4)])  #TABLE RECOMENDDED FOR REMEDIAL READING
print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of Reccomended
table(DF[c(2,4)])  #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING
print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended
#=================================================================================
#        ANALYSIS
#=================================================================================
(mod1<-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF))

library(statmod)
with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2
#If I move the decimal over 2 to the right I get close to the 9.5177 from the study
(with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2)*100 #is this it?
#END OF CODE

I am running R 2.13.0 in a windows 7 machine.

Peng, C., Lee, K., & Ingersoll, G. (2002). An Introduction to Logistic Regression Analysis and Reporting. The Journal of Educational Research, 96(1), 3-14. Heldref Publications. Retrieved from   
http://www.informaworld.com/openurl?genre=article&doi=10.1080/00220670209598786&magic=crossref 		 	   		  

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

The score test looks at the effect of adding extra columns to the model matrix.  The function glm.scoretest takes the fitted model object as the first argument and the extra column, or columns, as the second argument.  Your x2 argument has length only 3.  Is this really what you want?  I would have expected you need to specify a vector of length nrow(DF), [as in the help information for glm.scoretest itself].

glm.scoretest will only do single-df tests, so it's not going to help here. 

Notice that the test requested is a whole-model test, i.e. a comparison of the fitted model with an intercept-only model (AKA a null model). It is not a goodness of fit test (which is a good thing as those are often dubious with binary responses). In R-devel, we can do score tests for such model comparisons as follows:
mod2<-glm(reading.recommendation~1,family=binomial,data=DF)
anova(mod1,mod2,test="Rao")
Analysis of Deviance Table

Model 1: reading.recommendation ~ reading.score + gender
Model 2: reading.recommendation ~ 1
  Resid. Df Resid. Dev Df Deviance   Rao Pr(>Chi)   
1       186     224.64                              
2       188     234.67 -2  -10.033 -9.53 0.008523 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

This is pretty close to the cited SAS result. I cannot tell where the .01 discrepancy creeps in, but the GLM algorithm is not 1-step convergent for the null model, even though the solution can be written down explicitly.  (I don't have SAS to hand, but if anyone does, it would be interesting to see if it still says 9.5177 with the same data). 

With the current R, the closest you get is the asymptotically equivalent LRT:
anova(mod1,mod2,test="Chisq")
Analysis of Deviance Table

Model 1: reading.recommendation ~ reading.score + gender
Model 2: reading.recommendation ~ 1
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
1       186     224.64                        
2       188     234.67 -2  -10.033 0.006626 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com