An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100805/aab2de5f/attachment.pl>
AIC / BIC vs P-Values / MAM
2 messages · Chris Mcowen, Maarten de Groot
Hi Chris, There are many methods (Boyce index, maxKappa, etc.) to evaluated the predictions of a model when applied to test dataset. More information is given in the paper of Hirzel et al 2006 Ecological Modelling. Furthermore, have a look at the package PresenceAbsence (http://rss.acs.unt.edu/Rdoc/library/PresenceAbsence/html/PresenceAbsence.package.html). This one contains many evaluators except for Boyce index. Kind regards, Maarten
Chris Mcowen wrote:
Hi Chris and Ben, This is exactly what intended to do, I took 20 percent of my data set and left it out of the data I used to build the model to test later. I am relatively new to models in general and my PhD supervisors are both ecology/conservation based. I was therefore wondering if you could offer some advice as to the best method of evaluating the predictive ability of a model, both the method for actually predicting the result and then how to check the confidence. If this is too much to ask ( a workflow) then a few steps from which I can build upon would be gratefully received. Thanks again for your help, Chris
On 5 Aug 2010, at 02:12, Chris Howden <chris at trickysolutions.com.au> wrote:
Hi Ben,
Your absolutely right.
Which was why I said you should test the models predictive ability on the ???test data set???. I likely should have it a bit more clear that the ???test data set??? isn???t used when building the model. And I agree that Cross Validation is best, if U have the time and code that does it.
It???s also why I said that using AIC to decide which models to actually bother testing would be a good idea.
At least that???s the approach I usually use i.e.
1. Create the model and initially evaluate which are best using AIC, comparing each models log-likelihood to the Null model and other applicable models, and some common sense.
2. Then I evaluate the predictive ability of the best few models on a ???test data set??? which wasn???t used to create them.
Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
chris at trickysolutions.com.au
From: bbolker at gmail.com [mailto:bbolker at gmail.com]
Sent: Thursday, 5 August 2010 10:17 AM
To: Chris Howden
Cc: Chris Mcowen; r-sig-ecology at r-project.org
Subject: Re: Re: [R-sig-eco] AIC / BIC vs P-Values / MAM
On Aug 4, 2010 8:13pm, Chris Howden <chris at trickysolutions.com.au> wrote:
Hi Chris,
If u want good predictive ability, which is exactly what u do want when
using a model for prediction, then why not use its predictive ability as a
model selection criteria?
Because this will typically lead to overfitting the data, i.e. getting a great
fit to the 'training' set but then doing miserably on future data? Unless you do
something like split the data set into a training and a validation set, or
use cross-validation (which is a more sophisticated version of the same idea),
just finding the model with the best predictive capability on a specified
data set will *not* give you a good model in general. That's why approaches
such as AIC, corrected R^2, and so forth, include a penalty for model
complexity.
Unless I'm missing something really obvious, in which case I apologize.
Ben Bolker
[[alternative HTML version deleted]] ------------------------------------------------------------------------
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology