Calculation of BIC done by leaps-package
On 2010-12-26 06:47, Jan Henckens wrote:
Hi Folks, I've got a question concerning the calculation of the Schwarz-Criterion (BIC) done by summary.regsubsets() of the leaps-package: Using regsubsets() to perform subset-selection I receive an regsubsets object that can be summarized by summary.regsubsets(). After this operation the resulting summary contains a vector of BIC-values representing models of size i=1,...,K. My problem is that I can't reproduce the calculation of these BIC values. I already tried to use extractAIC(...,k=log(n)), AIC(...,k=log(n)) and manual calculation using the RSS-vector but none matches the calculation done by the summary-function. I already checked for constants that could be the reason for the differences but i found out, that the values vary apart of adding a constant term. The source code of the leaps-package states the package calculates the BIC this way: bicvec<-c(bicvec,(n1+ll$intercept)*log(vr)+i*log(n1+ll$intercept)) with: ## number of observations - Intercept: n1<-ll$nn-ll$intercept ## fraction of sum of squared residulas model i ## and sum of squared residuals null model, I ## just can't understand why the vector ll$ress ## is subscripted double
ll$ress is a one-column matrix
vr<-ll$ress[i,j]/ll$nullrss
## maximum number of variables
i
^^ This seems to match the calculation done by extractAIC but it doesn't!
Maybe anyone can tell me about the reason of the variation of the
BIC-values?
Best regards,
Jan Henckens
### Minimal Example:
require(leaps)
bridge<-
read.table("http://www.stat.tamu.edu/~sheather/book/docs/datasets/bridge.txt",
header=TRUE)
fmla.full<- formula(Time ~ .)
(lm.model<- summary(regsubsets(fmla.full,data=bridge,weights=NULL,
intercept=TRUE, method="forward")))
lm.model$bic
### The first two models constructed via lm():
extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge)))
extractAIC(lm(Time~Dwgs+Case,data=bridge),k=log(nrow(bridge)))
or see
http://www.henckens.de/min_example.R
Does this help: bic1 <- extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge))) bic2 <- extractAIC(lm(Time~Dwgs+Spans,data=bridge),k=log(nrow(bridge))) ## NOTE: predictors are 'Dwgs' and 'Spans', not 'Dwgs' and 'Case'. all.equal(bic2[2] - bic1[2], diff(lm.model$bic)[1]) #[1] TRUE ## or bic0.1 <- lm.model$bic[1] bic0.2 <- lm.model$bic[2] bic1 - bic0.1 #[1] 410.4472 bic2 - bic0.2 #[1] 410.4472 ## So, the values do differ from extractAIC by a constant. BTW: You probably want to omit 'Case'; I doubt that it's an intended predictor. Peter Ehlers