stepwise model selection (of fixed effects only) using AIC?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/0538b7a3/attachment.pl>
Obrigado, Diego.  Yes I have studied a little bit of information theory, tho my recollections are hazy.
you can not compare combinations of fixed effects of class "mer" with REML = TRUE.
Curious then, that that's the default value, and that the default anova() does precisely that by comparing two models differing only in the fixed effects included.

I'm aware of the objections, such as the danger of spurious relations.  But I cannot see why they prevent step(glmer()) when step(glm()) has been a standard feature in R for many years.  The real reason seems to be the fact that methods in package:stats don't work with S4 objects.

With my sample size, I think the difference between AIC and AICc is negligible.

cheers,
    Steve

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Diego Pujoni
Sent: Tuesday, 8 January 2013 2:04a
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only) using AIC?

Hi Steve, have you heard about Information-Theoretic Approach? It uses the
value of AIC (or AICc) to choose the best hypothesis among many a priori
hypothesis. In Anderson (2008) "Model Based Inference in the Life Sciences"
we see recomendations against stepwise (or all possible models) because
this can lead easily to spurious relations. The author recommend to create
several a priori hypothesis (models), using knowledge about the system and
then use the AICc to look for the best of them. Another thing that you have
to pay attention is the fact that you can not compare combinations of fixed
effects of class "mer" with REML = TRUE.

A hug
Diego PJ

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/18fd26f1/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/2fd5ab8e/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/ecdf9dba/attachment.pl>
Re stepwise or other variable selection approaches with lm()
[but the same issues arise more generally, including with
multi-level models), the function bsnVaryNvar() that is in more
recent versions of our DAAG package may be of some interest.
Just try running 

bsnVaryNvar(method='forward')
bsnVaryNvar(method='backward')
bsnVaryNvar()   ## Exhaustive selection

The default is to select the 'best' 3 variables from a number of
predictors that is varied between 3 and 50, with data in which
the predictors are independent gaussian noise, as is the outcome
variable.  When the best 3 variables are selected out of a number
that is in the region of 15 to 20 or so, the averaging method used 
by our function is likely to give an 'average' notional p-value that
has dropped below 0.05   There are of course ways to account
for the selection bias, but they are non-trivial. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

The default of the lmer is REML=TRUE, but the anova default is REML=FALSE
see
http://tolstoy.newcastle.edu.au/R/e6/help/09/04/11789.html

The author prevents not only step(glmer()), but any type of stepwise
selection, including step(glm()). For the author the model has to come
before the data (a priori hypothesis) and the data has to bring evidence to
accept or refuse this a priori hypothesis. Look for the best model that
fits the data is considered "data dredging" by the author and do not agree
with the "phylosophy" of the AIC (the existence of an infinite dimensional
real model).

Please, this is not my opinion, but the author's. I'm still studing if I
agree or not with it. But as I see in the papers, this kind of analysis are
becoming more and more used.

A hug

2013/1/7 Steve Taylor <steve.taylor at aut.ac.nz>

Obrigado, Diego.  Yes I have studied a little bit of information theory,
tho my recollections are hazy.

you can not compare combinations of fixed effects of class "mer" with
REML = TRUE.
Curious then, that that's the default value, and that the default anova()
does precisely that by comparing two models differing only in the fixed
effects included.

I'm aware of the objections, such as the danger of spurious relations.
But I cannot see why they prevent step(glmer()) when step(glm()) has been
a standard feature in R for many years.  The real reason seems to be the
fact that methods in package:stats don't work with S4 objects.

With my sample size, I think the difference between AIC and AICc is
negligible.

cheers,
   Steve

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of Diego Pujoni
Sent: Tuesday, 8 January 2013 2:04a
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only)
using AIC?

Hi Steve, have you heard about Information-Theoretic Approach? It uses the
value of AIC (or AICc) to choose the best hypothesis among many a priori
hypothesis. In Anderson (2008) "Model Based Inference in the Life Sciences"
we see recomendations against stepwise (or all possible models) because
this can lead easily to spurious relations. The author recommend to create
several a priori hypothesis (models), using knowledge about the system and
then use the AICc to look for the best of them. Another thing that you have
to pay attention is the fact that you can not compare combinations of fixed
effects of class "mer" with REML = TRUE.

A hug

--
                                              Diego PJ

       [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
                                              Diego PJ

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models