An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/0538b7a3/attachment.pl>
stepwise model selection (of fixed effects only) using AIC?
6 messages · Steve Taylor, Diego Pujoni, Philippi, Tom +1 more
Obrigado, Diego. Yes I have studied a little bit of information theory, tho my recollections are hazy.
you can not compare combinations of fixed effects of class "mer" with REML = TRUE.
Curious then, that that's the default value, and that the default anova() does precisely that by comparing two models differing only in the fixed effects included.
I'm aware of the objections, such as the danger of spurious relations. But I cannot see why they prevent step(glmer()) when step(glm()) has been a standard feature in R for many years. The real reason seems to be the fact that methods in package:stats don't work with S4 objects.
With my sample size, I think the difference between AIC and AICc is negligible.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Diego Pujoni
Sent: Tuesday, 8 January 2013 2:04a
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only) using AIC?
Hi Steve, have you heard about Information-Theoretic Approach? It uses the
value of AIC (or AICc) to choose the best hypothesis among many a priori
hypothesis. In Anderson (2008) "Model Based Inference in the Life Sciences"
we see recomendations against stepwise (or all possible models) because
this can lead easily to spurious relations. The author recommend to create
several a priori hypothesis (models), using knowledge about the system and
then use the AICc to look for the best of them. Another thing that you have
to pay attention is the fact that you can not compare combinations of fixed
effects of class "mer" with REML = TRUE.
A hug
Diego PJ [[alternative HTML version deleted]] _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/18fd26f1/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/2fd5ab8e/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130107/ecdf9dba/attachment.pl>
Re stepwise or other variable selection approaches with lm() [but the same issues arise more generally, including with multi-level models), the function bsnVaryNvar() that is in more recent versions of our DAAG package may be of some interest. Just try running bsnVaryNvar(method='forward') bsnVaryNvar(method='backward') bsnVaryNvar() ## Exhaustive selection The default is to select the 'best' 3 variables from a number of predictors that is varied between 3 and 50, with data in which the predictors are independent gaussian noise, as is the outcome variable. When the best 3 variables are selected out of a number that is in the region of 15 to 20 or so, the averaging method used by our function is likely to give an 'average' notional p-value that has dropped below 0.05 There are of course ways to account for the selection bias, but they are non-trivial. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 08/01/2013, at 10:31 AM, Diego Pujoni <diegopujoni at gmail.com> wrote:
The default of the lmer is REML=TRUE, but the anova default is REML=FALSE see http://tolstoy.newcastle.edu.au/R/e6/help/09/04/11789.html The author prevents not only step(glmer()), but any type of stepwise selection, including step(glm()). For the author the model has to come before the data (a priori hypothesis) and the data has to bring evidence to accept or refuse this a priori hypothesis. Look for the best model that fits the data is considered "data dredging" by the author and do not agree with the "phylosophy" of the AIC (the existence of an infinite dimensional real model). Please, this is not my opinion, but the author's. I'm still studing if I agree or not with it. But as I see in the papers, this kind of analysis are becoming more and more used. A hug 2013/1/7 Steve Taylor <steve.taylor at aut.ac.nz>
Obrigado, Diego. Yes I have studied a little bit of information theory, tho my recollections are hazy.
you can not compare combinations of fixed effects of class "mer" with
REML = TRUE.
Curious then, that that's the default value, and that the default anova()
does precisely that by comparing two models differing only in the fixed
effects included.
I'm aware of the objections, such as the danger of spurious relations.
But I cannot see why they prevent step(glmer()) when step(glm()) has been
a standard feature in R for many years. The real reason seems to be the
fact that methods in package:stats don't work with S4 objects.
With my sample size, I think the difference between AIC and AICc is
negligible.
cheers,
Steve
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of Diego Pujoni
Sent: Tuesday, 8 January 2013 2:04a
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] stepwise model selection (of fixed effects only)
using AIC?
Hi Steve, have you heard about Information-Theoretic Approach? It uses the
value of AIC (or AICc) to choose the best hypothesis among many a priori
hypothesis. In Anderson (2008) "Model Based Inference in the Life Sciences"
we see recomendations against stepwise (or all possible models) because
this can lead easily to spurious relations. The author recommend to create
several a priori hypothesis (models), using knowledge about the system and
then use the AICc to look for the best of them. Another thing that you have
to pay attention is the fact that you can not compare combinations of fixed
effects of class "mer" with REML = TRUE.
A hug
--
Diego PJ
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Diego PJ
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models