AIC / BIC vs P-Values / MAM
If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly
My approach was to rank the model according to - AIC (model of interest) ? AICmin (aic value of minimum model) = relative AIC difference and then only use model averaging on the set of models where the value was 0-2 - (Burnham & Anderson, 2002).
I don't quite understand.
Sorry i was trying to say i then need to think of a way of validating the goodness of fit as i want to use my training data to predict my test data, and i have never used a model to predict unknown values. But i am sure i will come to it if read around! Thanks for all your help, it is greatly appreciated
On 4 Aug 2010, at 20:09, Ben Bolker wrote:
On 10-08-04 01:13 PM, Chris Mcowen wrote:
Hi Ben, That is great thanks.
whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions
I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too. The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc) if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?
If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly (although I do admit that wondering why p-values and AIC gave different rankings is worth thinking about -- I'm just not sure there's a short answer without looking through all of the data). You should take a look at the AICcmodavg and MuMIn packages on CRAN -- one or the other may (?) be able to handle lmer fits.
My best guess as to what's going on here is that you have a good deal of correlation among your factors
I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?
Often but not necessarily. Zuur et al have a recent paper in Methods in Ecology and Evolution you might want to look at.
some combinations of factors are under/overrepresented in the data set)
Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?
simply fit
the full model and base your inference on the estimates and confidence intervals from the full mode
I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?
I don't quite understand. Ben