Skip to content

Logistic Regression - Variable Selection Methods With Prediction

7 messages · rajclinasia, Steve_Friedman at nps.gov, Steve Lianoglou +3 more

#
Hello,

I am pretty new to R, I have always used SAS and SAS products. My
target variable is binary ('Y' and 'N') and i have about 14 predictor
variables. My goal is to compare different variable selection methods
like Forward, Backward, All possible subsests. I am using
misclassification rate to pick the winner method.

This is what i have as of now,

Reg <- glm (Graduation ~., DFtrain,family=binomial(link="logit"))
		step <- extractAIC(Reg, direction="forward")
		pred <- predict(Reg, DFtest,type="response")
		mis <- mean({pred > 0.5} != {DFtest[,"Graduation"] == "Y"})
This program actually works but I needed to check to make sure am
doing this right. Also, I am getting the same misclassification rates
for all different methods.

I also tried to use

Reg <- leaps(Graduation ~., DFtrain)
		pred <- predict(Reg, DFtest,type="response")
		mis <- mean({pred > 0.5} != {DFtest[,"Graduation"] == "Y"})
		#print(summary(mis))
which doesnt work

and

Reg <- regsubsets(Graduation ~., DFtrain)
		pred <- predict(Reg, DFtest,type="response")
		mis <- mean({pred > 0.5} != {DFtest[,"Graduation"] == "Y"})
		#print(summary(mis))

The Regsubsets will work but the 'predict' function does not work with
it. Is there any other way to do predictions when using regsubsets

Any help is appreciated.

Thanks,
#
Can I atleast get help with what pacakge to use for logistic
regression with all possible models and do prediction. I know i can
use regsubsets but i am not sure if it has any prediction functions to
go with it.

Thanks
On Oct 25, 6:54?pm, RAJ <dheerajathr... at gmail.com> wrote:
#
Try the glm package

Steve Friedman Ph. D.
Ecologist  / Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax     (305) 224 - 4147
#
Hi,
On Wed, Oct 26, 2011 at 12:35 PM, RAJ <dheerajathreya at gmail.com> wrote:
Maybe you could try glmnet instead.

It doesn't give you "all possible" models, but rather the best one at
a given value for the penalty (lambda) parameter.

HTH,

-steve
#
Check glmulti package for all subset selection.

Weidong Gu
On Wed, Oct 26, 2011 at 12:35 PM, RAJ <dheerajathreya at gmail.com> wrote:
#
The reason that you are not likely getting replies is that what you propose to do is considered a poor way of building models. 

You need to get out of the "SAS Mindset".

I would suggest you obtain a copy of Frank Harrell's book:

  http://www.amazon.com/exec/obidos/ASIN/0387952322/

and then consider using his 'rms' package on CRAN to engage in modeling building strategies and validation.

Regards,

Marc Schwartz
On Oct 26, 2011, at 11:35 AM, RAJ wrote: