AIC / BIC vs P-Values / MAM

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100804/d3324a35/attachment.pl>
Dear List,

I was after some advice on model selection,

OK, you asked for it ...
I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham&  Anderson, 2002). However there seems to be a huge difference in my results.

In my opinion model selection via AIC shares most of the 
disadvantages of p-value based model selection. "All model" selection is 
slightly better than stepwise approaches because it is less susceptible 
to getting stuck in some weird local branch, but whether you select 
models via p-value or AIC *should* be based on whether you are trying to 
test hypotheses or make predictions, and you should seriously question 
whether you should be doing model selection in the first place. You 
should *not* select a model and then make inferences about the 
'significance' of what remains in the model ...

   AIC is great but it's not a panacea.

    Now -- on to "p vs AIC" question.
The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match?

The factors retained by the MAM are theoretically what i would expect, so i am a bit surprised as to why the model with the lowest AIC doesn't contain them? I have ranked the AIC models with Akaike weights, but still the top ranked models don't incorporate the traits i would expect / retained in the MAM.

LOWEST AIC MODEL

model43<- lmer(threatornot~1+(1|order/family) + geophyte + seasonality + pollendispersal + woodyness, family=binomial)

model43

Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | order/family) + geophyte + seasonality +      pollendispersal + woodyness
   AIC  BIC logLik deviance
  1395 1430 -690.6     1381
Random effects:
  Groups       Name        Variance Std.Dev.
  family:order (Intercept) 0.37447  0.61194
  order        (Intercept) 0.00000  0.00000
Number of obs: 1116, groups: family:order, 43; order, 9

Fixed effects:
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)       0.40234    0.43237   0.931  0.35208
geophyte2         0.06453    0.19616   0.329  0.74218
seasonality2     -1.06900    0.34241  -3.122  0.00180 **
pollendispersal2  0.64474    0.31089   2.074  0.03809 *
woodyness2        0.47599    0.25646   1.856  0.06346 .

BEST STEPWISE MAM

Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | order/family) + breedingsystem * fruit +      woodyness
   AIC  BIC logLik deviance
  1409 1454 -695.3     1391
Random effects:
  Groups       Name        Variance Std.Dev.
  family:order (Intercept) 0.52475  0.7244
  order        (Intercept) 0.00000  0.0000
Number of obs: 1116, groups: family:order, 43; order, 9

Fixed effects:
                        Estimate Std. Error z value Pr(>|z|)
(Intercept)             -1.1290     0.4909  -2.300   0.0215 *
breedingsystem2          0.8123     0.4756   1.708   0.0876 .
breedingsystem3          0.9449     0.5246   1.801   0.0717 .
fruit2                   1.3885     0.6221   2.232   0.0256 *
woodyness2               0.5484     0.2627   2.088   0.0368 *
breedingsystem2:fruit2  -1.6218     0.6577  -2.466   0.0137 *
breedingsystem3:fruit2  -1.6645     0.7449  -2.235   0.0255 *

The breedingsystem* fruit interaction, should, based on theory be important so why is it not in the model with the lowest AIC but is in the MAM?

I am not sure if it is because i did not set out my candidate models correctly, I did a different model for every combination of traits (2 to the power of 7) -1 as i was unsure of which models would be important. I was given the data, i didn't collect it, therefore i have to work with what i have.

My best guess as to what's going on here is that you have a good 
deal of correlation among your factors (in this case, with
discrete factors, that means that some combinations of factors are 
under/overrepresented in the data set), which means that quite
different combinations of factors can fit/explain the data approximately 
equally well.
    It's really hard to say without going through the data in detail.
    My advice would be to (a) read [or skim] Frank Harrell's book on 
Regression Modeling Strategies, particularly about the
dangers of model reduction; (b) if you're interested in **testing 
hypotheses about which factors are important**, simply fit
the full model and base your inference on the estimates and confidence 
intervals from the full model.

   good luck,
     Ben Bolker
Hi Ben,

That is great thanks.
whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions
I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too.

The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc)  if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?
My best guess as to what's going on here is that you have a good deal of correlation among your factors
I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?
some combinations of factors are under/overrepresented in the data set)
Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?
simply fit
the full model and base your inference on the estimates and confidence intervals from the full mode
I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?

Thanks again for your time,

Chris

Dear List,

I was after some advice on model selection,

OK, you asked for it ...
I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham&  Anderson, 2002). However there seems to be a huge difference in my results.

In my opinion model selection via AIC shares most of the disadvantages of p-value based model selection. "All model" selection is slightly better than stepwise approaches because it is less susceptible to getting stuck in some weird local branch, but whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions, and you should seriously question whether you should be doing model selection in the first place. You should *not* select a model and then make inferences about the 'significance' of what remains in the model ...

 AIC is great but it's not a panacea.

  Now -- on to "p vs AIC" question.
The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match?

The factors retained by the MAM are theoretically what i would expect, so i am a bit surprised as to why the model with the lowest AIC doesn't contain them? I have ranked the AIC models with Akaike weights, but still the top ranked models don't incorporate the traits i would expect / retained in the MAM.

LOWEST AIC MODEL

model43<- lmer(threatornot~1+(1|order/family) + geophyte + seasonality + pollendispersal + woodyness, family=binomial)

model43

Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | order/family) + geophyte + seasonality +      pollendispersal + woodyness
  AIC  BIC logLik deviance
 1395 1430 -690.6     1381
Random effects:
 Groups       Name        Variance Std.Dev.
 family:order (Intercept) 0.37447  0.61194
 order        (Intercept) 0.00000  0.00000
Number of obs: 1116, groups: family:order, 43; order, 9

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)       0.40234    0.43237   0.931  0.35208
geophyte2         0.06453    0.19616   0.329  0.74218
seasonality2     -1.06900    0.34241  -3.122  0.00180 **
pollendispersal2  0.64474    0.31089   2.074  0.03809 *
woodyness2        0.47599    0.25646   1.856  0.06346 .

BEST STEPWISE MAM

Generalized linear mixed model fit by the Laplace approximation
Formula: threatornot ~ 1 + (1 | order/family) + breedingsystem * fruit +      woodyness
  AIC  BIC logLik deviance
 1409 1454 -695.3     1391
Random effects:
 Groups       Name        Variance Std.Dev.
 family:order (Intercept) 0.52475  0.7244
 order        (Intercept) 0.00000  0.0000
Number of obs: 1116, groups: family:order, 43; order, 9

Fixed effects:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)             -1.1290     0.4909  -2.300   0.0215 *
breedingsystem2          0.8123     0.4756   1.708   0.0876 .
breedingsystem3          0.9449     0.5246   1.801   0.0717 .
fruit2                   1.3885     0.6221   2.232   0.0256 *
woodyness2               0.5484     0.2627   2.088   0.0368 *
breedingsystem2:fruit2  -1.6218     0.6577  -2.466   0.0137 *
breedingsystem3:fruit2  -1.6645     0.7449  -2.235   0.0255 *

The breedingsystem* fruit interaction, should, based on theory be important so why is it not in the model with the lowest AIC but is in the MAM?

I am not sure if it is because i did not set out my candidate models correctly, I did a different model for every combination of traits (2 to the power of 7) -1 as i was unsure of which models would be important. I was given the data, i didn't collect it, therefore i have to work with what i have.

My best guess as to what's going on here is that you have a good deal of correlation among your factors (in this case, with
discrete factors, that means that some combinations of factors are under/overrepresented in the data set), which means that quite
different combinations of factors can fit/explain the data approximately equally well.
  It's really hard to say without going through the data in detail.
  My advice would be to (a) read [or skim] Frank Harrell's book on Regression Modeling Strategies, particularly about the
dangers of model reduction; (b) if you're interested in **testing hypotheses about which factors are important**, simply fit
the full model and base your inference on the estimates and confidence intervals from the full model.

 good luck,
   Ben Bolker
Hi Ben,

That is great thanks.

whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions

I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too.

The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc)  if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?

If you are *really* trying to predict (rather than test hypotheses), 
and you really use model averaging, then I would be fine with this 
approach -- but then you wouldn't be spending any time worrying about 
which models were weighted how strongly (although I do admit that 
wondering why p-values and AIC gave different rankings is worth thinking 
about -- I'm just not sure there's a short answer without looking 
through all of the data).

   You should take a look at the AICcmodavg and MuMIn packages on CRAN 
-- one or the other may (?) be able to handle lmer fits.

My best guess as to what's going on here is that you have a good deal of correlation among your factors

I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?

Often but not necessarily.  Zuur et al have a recent paper in 
Methods in Ecology and Evolution you might want to look at.

some combinations of factors are under/overrepresented in the data set)

Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?

simply fit
the full model and base your inference on the estimates and confidence intervals from the full mode

I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?

I don't quite understand.

   Ben
If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly
My approach was to rank the model according to -  AIC  (model of interest) ? AICmin (aic value of minimum model) = relative AIC difference and then only use model averaging on the set of models where the value was 0-2 - (Burnham & Anderson, 2002).
 I don't quite understand.
Sorry i was trying to say i then need to think of a way of validating the goodness of fit as i want to use my training data to predict my test data, and i have never used a model to predict unknown values. But i am sure i will come to it if  read around!

Thanks for all your help, it is greatly appreciated

Hi Ben,

That is great thanks.

whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions

I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too.

The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc)  if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?

If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly (although I do admit that wondering why p-values and AIC gave different rankings is worth thinking about -- I'm just not sure there's a short answer without looking through all of the data).

 You should take a look at the AICcmodavg and MuMIn packages on CRAN -- one or the other may (?) be able to handle lmer fits.

My best guess as to what's going on here is that you have a good deal of correlation among your factors

I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?

Often but not necessarily.  Zuur et al have a recent paper in Methods in Ecology and Evolution you might want to look at.

some combinations of factors are under/overrepresented in the data set)

Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?

simply fit
the full model and base your inference on the estimates and confidence intervals from the full mode

I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?

I don't quite understand.

 Ben
If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly
My approach was to rank the model according to - ?AIC ?(model of interest) ? AICmin (aic value of minimum model) = relative AIC difference and then only use model averaging on the set of models where the value was 0-2 - (Burnham & Anderson, 2002).

Do they really recommend dropping all models below delta-AIC=2??
If you're going
to drop anything, I would say a cut-off of 10 or so would be more
practical.  Just as
a (slightly extreme) example, suppose you had three models with delta-AIC=0 (the
best), 3, and 3.  Then the AIC weight of the top model would only be
1/(1+2*exp(-1.5)) approx 0.7 -- by dropping the other models you'd be throwing
out 30% of the model weight ...

?I don't quite understand.
Sorry i was trying to say i then need to think of a way of validating the goodness of fit as i want to use my training data to predict my test data, and i have never used a model to predict unknown values. But i am sure i will come to it if ?read around!

Thanks for all your help, it is greatly appreciated

On 4 Aug 2010, at 20:09, Ben Bolker wrote:

On 10-08-04 01:13 PM, Chris Mcowen wrote:
Hi Ben,

That is great thanks.

whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions

I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too.

The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc) ?if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?

?If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly (although I do admit that wondering why p-values and AIC gave different rankings is worth thinking about -- I'm just not sure there's a short answer without looking through all of the data).

?You should take a look at the AICcmodavg and MuMIn packages on CRAN -- one or the other may (?) be able to handle lmer fits.

My best guess as to what's going on here is that you have a good deal of correlation among your factors

I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?

? Often but not necessarily. ?Zuur et al have a recent paper in Methods in Ecology and Evolution you might want to look at.

some combinations of factors are under/overrepresented in the data set)

Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?

simply fit
the full model and base your inference on the estimates and confidence intervals from the full mode

I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?

?I don't quite understand.

?Ben

Hi Chris,

If u want good predictive ability, which is exactly what u do want when
using a model for prediction, then why not use its predictive ability as a
model selection criteria?

This can be done by calculating the predictive error of various models on
your test data set and use that as a model selection criteria. Maybe use
AIC to decide which models to bother testing, but use its predictive
ability as the final test. I usually also look at min and max errors, and
the error distribution in general.

When it comes to hypothesis testing I sometimes fit a series of simple
models, one for each predictor. This allows me to test each one's "sole"
correlation/association. It works very well when there is a lot of
correlation amongst predictors, which is when a full model will not work
as well and can give very misleading results. If there are any known
co-variates then I might fit them also so I can test the hypothesis
predictors effect in conjunction with the covariates.

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
chris at trickysolutions.com.au

-----Original Message-----
From: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of Chris Mcowen
Sent: Thursday, 5 August 2010 5:01 AM
To: Ben Bolker
Cc: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] AIC / BIC vs P-Values / MAM
If you are *really* trying to predict (rather than test hypotheses), and
you really use model averaging, then I would be fine with this approach --
but then you wouldn't be spending any time worrying about which models
were weighted how strongly

My approach was to rank the model according to -  AIC  (model of interest)
- AICmin (aic value of minimum model) = relative AIC difference and then
only use model averaging on the set of models where the value was 0-2 -
(Burnham & Anderson, 2002).
 I don't quite understand.
Sorry i was trying to say i then need to think of a way of validating the
goodness of fit as i want to use my training data to predict my test data,
and i have never used a model to predict unknown values. But i am sure i
will come to it if  read around!

Thanks for all your help, it is greatly appreciated

Hi Ben,

That is great thanks.

whether you select models via p-value or AIC *should* be based on
whether you are trying to test hypotheses or make predictions

I have 7 factors of which 5 have been shown, theoretically and
empirically, to have an impact on my response variable. The other two are
somewhat wild shots, but i have a hunch they are important too.
The problem is there are no clear analytical patterns of the variables,
they don't fit into neat boxed themes (size, shape etc)  if you will,
therefore making a hypotheses about how they inter-react is hard.
Therefore forming a subset of models to test is very difficult, my
approach has been to use all combinations of factors to generate the
candidate models. I am worried that this approach is taking me down the
data dredging/ model simplification route i am trying to avoid. Is it bad
practice to use all combinations? As long as i rank them by akaike weight
and use model averaging techniques isn't this OK?

If you are *really* trying to predict (rather than test hypotheses), and
you really use model averaging, then I would be fine with this approach --
but then you wouldn't be spending any time worrying about which models
were weighted how strongly (although I do admit that wondering why
p-values and AIC gave different rankings is worth thinking about -- I'm
just not sure there's a short answer without looking through all of the
data).

 You should take a look at the AICcmodavg and MuMIn packages on CRAN --
one or the other may (?) be able to handle lmer fits.

My best guess as to what's going on here is that you have a good deal
of correlation among your factors

I tested this with Pearson's R and only one combination showed up as
having a strong correlation, is this not sufficient?

Often but not necessarily.  Zuur et al have a recent paper in Methods
in Ecology and Evolution you might want to look at.

some combinations of factors are under/overrepresented in the data set)

Thats is certainly the case, but i cant do much about that, is it not
just sufficent to rely on Pearson's values as mentioned above?

simply fit
the full model and base your inference on the estimates and confidence
intervals from the full mode

I want to be able to predict the threat status ( the response variable)
for species i only have traits (factors) for, this approach would not
really let me do this would it?

I don't quite understand.

 Ben

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100805/d15f3fb2/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100805/e9723bb8/attachment.pl>