An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110612/f8212a09/attachment.pl>
Score Test Function
3 messages · Tyler Rinker, Bill Venables, Peter Dalgaard
The score test looks at the effect of adding extra columns to the model matrix. The function glm.scoretest takes the fitted model object as the first argument and the extra column, or columns, as the second argument. Your x2 argument has length only 3. Is this really what you want? I would have expected you need to specify a vector of length nrow(DF), [as in the help information for glm.scoretest itself].
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Tyler Rinker
Sent: Sunday, 12 June 2011 2:46 PM
To: r-help at r-project.org
Subject: [R] Score Test Function
Greeting R Community,
I'm trying to learn Logistic Regression on my own and am using An Introduction to Logistic Regression Analysis and Reporting (Peng, C., Lee, K., & Ingersoll, G. ,2002). This article uses a Score Test Stat as a measure of overall fit for a logistic regression model. The author calculates this statistic using SAS. I am looking for an [R] function that can compute this stat and a p=value (please don't critique the stat itself). As far as I can tell glm.scoretest() from library(statmod) is the only function for this but it does not give a pvalue and appears to not correspond with the author's example. Some chat on the discussion board a while back indicated that this was a well known test Click here for that thread but difficult to calculate in [R]. I'm wondering if there is a way to do this fairly easily in [R] at this time?
I am working with the data set from the study and am looking to get close to the 9.5177 score test stat and .0086 p-value with 2 degrees freedom.
Below is the code for the data set and some descriptives so you can see the data set is the same as from the study. I highlighted my attempt to get a score test statistic and am curious if this is it (minus the p-value).
#BEGINNING OF CODE
id<-factor(1:189)
gender<-factor(c("Boy","Boy","Girl","Girl","Girl","Boy","Girl","Boy","Girl","Girl","Boy","Boy","Boy","Boy","Girl","Girl","Boy","Girl","Boy","Girl","Boy","Girl","Girl",
"Boy","Boy","Girl","Girl","Girl","Boy","Boy","Boy","Girl","Boy","Girl","Boy","Girl","Girl","Girl","Girl","Girl","Boy","Girl","Boy","Girl","Girl","Girl",
"Girl","Boy","Girl","Boy","Girl","Boy","Girl","Girl","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Girl","Boy","Boy","Boy","Boy","Girl","Boy",
"Girl","Boy","Boy","Boy","Girl","Boy","Girl","Girl","Boy","Girl","Girl","Girl","Boy","Boy","Boy","Boy","Boy","Girl","Girl","Girl","Girl","Boy","Girl",
"Girl","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Boy","Girl","Boy","Boy","Girl","Girl","Girl","Boy","Girl","Boy","Girl","Girl","Girl","Boy",
"Girl","Boy","Girl","Boy","Girl","Boy","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Girl","Boy","Girl","Boy","Boy","Boy","Boy","Boy","Boy","Boy","Girl",
"Girl","Girl","Boy","Boy","Girl","Girl","Boy","Girl","Boy","Boy","Boy","Girl","Girl","Girl","Girl","Boy","Boy","Girl","Boy","Boy","Girl","Boy","Boy","Boy",
"Boy","Girl","Boy","Boy","Girl","Girl","Boy","Boy","Boy","Boy","Boy","Girl","Girl","Girl","Girl","Boy","Boy","Boy","Girl","Boy","Girl","Boy","Boy","Boy","Girl"))
reading.score<-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0,
60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0,
80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5,
57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5,
62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0,
65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5,
75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0,
65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0,
52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0)
reading.recommendation<-as.factor(c(rep("no",130),rep("yes",59)))
DF<-data.frame(id,gender,reading.score,reading.recommendation)
head(DF)
#=================================================================================
# DESCRIPTIVES
#=================================================================================
table(DF[2:4]) #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING RECOMENDATIONS
table(DF[c(2)]) #TABLE OF GENDER
print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER BREAKDOWN
table(DF[c(4)]) #TABLE RECOMENDDED FOR REMEDIAL READING
print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of Reccomended
table(DF[c(2,4)]) #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING
print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended
#=================================================================================
# ANALYSIS
#=================================================================================
(mod1<-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF))
library(statmod)
with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2
#If I move the decimal over 2 to the right I get close to the 9.5177 from the study
(with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2)*100 #is this it?
#END OF CODE
I am running R 2.13.0 in a windows 7 machine.
Peng, C., Lee, K., & Ingersoll, G. (2002). An Introduction to Logistic Regression Analysis and Reporting. The Journal of Educational Research, 96(1), 3-14. Heldref Publications. Retrieved from
http://www.informaworld.com/openurl?genre=article&doi=10.1080/00220670209598786&magic=crossref
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On Jun 12, 2011, at 07:54 , <Bill.Venables at csiro.au> <Bill.Venables at csiro.au> wrote:
The score test looks at the effect of adding extra columns to the model matrix. The function glm.scoretest takes the fitted model object as the first argument and the extra column, or columns, as the second argument. Your x2 argument has length only 3. Is this really what you want? I would have expected you need to specify a vector of length nrow(DF), [as in the help information for glm.scoretest itself].
glm.scoretest will only do single-df tests, so it's not going to help here. Notice that the test requested is a whole-model test, i.e. a comparison of the fitted model with an intercept-only model (AKA a null model). It is not a goodness of fit test (which is a good thing as those are often dubious with binary responses). In R-devel, we can do score tests for such model comparisons as follows:
mod2<-glm(reading.recommendation~1,family=binomial,data=DF) anova(mod1,mod2,test="Rao")
Analysis of Deviance Table Model 1: reading.recommendation ~ reading.score + gender Model 2: reading.recommendation ~ 1 Resid. Df Resid. Dev Df Deviance Rao Pr(>Chi) 1 186 224.64 2 188 234.67 -2 -10.033 -9.53 0.008523 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 This is pretty close to the cited SAS result. I cannot tell where the .01 discrepancy creeps in, but the GLM algorithm is not 1-step convergent for the null model, even though the solution can be written down explicitly. (I don't have SAS to hand, but if anyone does, it would be interesting to see if it still says 9.5177 with the same data). With the current R, the closest you get is the asymptotically equivalent LRT:
anova(mod1,mod2,test="Chisq")
Analysis of Deviance Table Model 1: reading.recommendation ~ reading.score + gender Model 2: reading.recommendation ~ 1 Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 186 224.64 2 188 234.67 -2 -10.033 0.006626 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com