Skip to content

all subsets for glm

3 messages · Thomas Lumley, Harald von Waldow

#
Thanks Dieter. In case an exhaustive search (all subsets) remains
infeasible, I'll include a shrinkage method for sure. Looks like
glmpath could be useful here.

Best,
Harald
1 day later
#
If you actually want to find the best subsets, you can get a good 
approximation by using leaps on the weighted least squares fit that is the 
last iteration of the IWLS algorithm for fitting the glm.

Running regsubsets witha reasonably large value of nbest and then 
refitting the top models as glms afterwards will fairly realiably give the 
best glms.

Whether this is better than lasso depends on what you are trying to do - 
IMO the only point of all-subsets regression is to get many best models 
rather than a single one, and lasso doesn't do at all well at that.

 	-thomas
On Sat, 4 Apr 2009, Harald von Waldow wrote:

            
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
Thanks, that sounds interesting. I am as yet clueless to the workings
of IWLS, so maybe this is nonsense: The result of running glm on the
full model (all variables) is a crass example for overfitting, i.e.
zero residuals, all R_i^2 close to 1, large coefficients. Would then
the "weighed last squares fit of the last iteration of IWLS" not be
pretty meaningless ?
Yes, I am trying to get a number of best models, since the final model
selection shall be based on interpretability and expert knowledge. By
now I have bootstrapped the lasso (using glmpath) to generate such a
set, but the resulting models are very similar and I suspect there are
is a larger variety of "best models".
 
Harald