Logistic Regression: variable selection based on p value?
Puff - There are many strategies, ideas, and literature on this topic. A great introduction that leads to many of the references that are interesting is Frank Harrell's book, "Regression Modeling Strategies". I would highly recommend it.
pufftissue pufftissue wrote:
Hi, When I use logistic regression, each variable has a p value associated with it. Do I only include the variables that have a statistically significant p value (<0.05), or are there situations when I should include variables when their p values are high? I had heard that if a variable has a high p value but it's not the terminal variable, keep it; otherwise, take it out. Not sure if it's right or even why this is the case. What about if my p values are terrible but this combo of variables yields the highest AUC and calibration? What prevails in this case? Thank you! [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.