Skip to content

stepwise model selection (of fixed effects only) using AIC?

6 messages · Steve Taylor, Diego Pujoni, Philippi, Tom +1 more

#
Obrigado, Diego.  Yes I have studied a little bit of information theory, tho my recollections are hazy.
Curious then, that that's the default value, and that the default anova() does precisely that by comparing two models differing only in the fixed effects included.

I'm aware of the objections, such as the danger of spurious relations.  But I cannot see why they prevent step(glmer()) when step(glm()) has been a standard feature in R for many years.  The real reason seems to be the fact that methods in package:stats don't work with S4 objects.

With my sample size, I think the difference between AIC and AICc is negligible.

cheers,
    Steve

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Diego Pujoni
Sent: Tuesday, 8 January 2013 2:04a
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only) using AIC?

Hi Steve, have you heard about Information-Theoretic Approach? It uses the
value of AIC (or AICc) to choose the best hypothesis among many a priori
hypothesis. In Anderson (2008) "Model Based Inference in the Life Sciences"
we see recomendations against stepwise (or all possible models) because
this can lead easily to spurious relations. The author recommend to create
several a priori hypothesis (models), using knowledge about the system and
then use the AICc to look for the best of them. Another thing that you have
to pay attention is the fact that you can not compare combinations of fixed
effects of class "mer" with REML = TRUE.

A hug
#
Re stepwise or other variable selection approaches with lm()
[but the same issues arise more generally, including with
multi-level models), the function bsnVaryNvar() that is in more
recent versions of our DAAG package may be of some interest.
Just try running 

bsnVaryNvar(method='forward')
bsnVaryNvar(method='backward')
bsnVaryNvar()   ## Exhaustive selection

The default is to select the 'best' 3 variables from a number of
predictors that is varied between 3 and 50, with data in which
the predictors are independent gaussian noise, as is the outcome
variable.  When the best 3 variables are selected out of a number
that is in the region of 15 to 20 or so, the averaging method used 
by our function is likely to give an 'average' notional p-value that
has dropped below 0.05   There are of course ways to account
for the selection bias, but they are non-trivial. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 08/01/2013, at 10:31 AM, Diego Pujoni <diegopujoni at gmail.com> wrote: