AIC / BIC vs P-Values / MAM

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20100805/aab2de5f/attachment.pl>
Hi Chris,

There are many methods (Boyce index, maxKappa, etc.) to evaluated the 
predictions of a model when applied to test dataset. More information is 
given in the paper of Hirzel et al 2006 Ecological Modelling. 
Furthermore, have a look at the package PresenceAbsence 
(http://rss.acs.unt.edu/Rdoc/library/PresenceAbsence/html/PresenceAbsence.package.html). 
This one contains many evaluators except for Boyce index.

Kind regards,

Maarten
Hi Chris and Ben,

This is exactly what intended to do, I took 20 percent of my data set and left it out of the data I used to build the model to test later.

I am relatively new to models in general and my PhD supervisors are both ecology/conservation based. I was therefore wondering if you could offer some advice as to the best method of evaluating the predictive ability of a model, both the method for actually predicting the result and then how to check the confidence. If this is too much to ask ( a workflow) then a few steps from which I can build upon would be gratefully received.

Thanks again for your help,

Chris 

On 5 Aug 2010, at 02:12, Chris Howden <chris at trickysolutions.com.au> wrote:

Hi Ben,

Your absolutely right.

Which was why I said you should test the models predictive ability on the ???test data set???. I likely should have it a bit more clear that the ???test data set??? isn???t used when building the model. And I agree that Cross Validation is best, if U have the time and code that does it.

It???s also why I said that using AIC to decide which models to actually bother testing would be a good idea.

At least that???s the approach I usually use i.e.

1.      Create the model and initially evaluate which are best using AIC, comparing each models log-likelihood to the Null model and other applicable models,  and some common sense.

2.      Then I evaluate the predictive ability of the best few models on a ???test data set??? which wasn???t used to create them.

Chris Howden

Founding Partner

Tricky Solutions

Tricky Solutions 4 Tricky Problems

Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training

(mobile) 0410 689 945

(fax / office) (+618) 8952 7878

chris at trickysolutions.com.au

From: bbolker at gmail.com [mailto:bbolker at gmail.com] 
Sent: Thursday, 5 August 2010 10:17 AM
To: Chris Howden
Cc: Chris Mcowen; r-sig-ecology at r-project.org
Subject: Re: Re: [R-sig-eco] AIC / BIC vs P-Values / MAM

On Aug 4, 2010 8:13pm, Chris Howden <chris at trickysolutions.com.au> wrote:

Hi Chris,

If u want good predictive ability, which is exactly what u do want when
using a model for prediction, then why not use its predictive ability as a
model selection criteria?

Because this will typically lead to overfitting the data, i.e. getting a great
fit to the 'training' set but then doing miserably on future data? Unless you do
something like split the data set into a training and a validation set, or
use cross-validation (which is a more sophisticated version of the same idea),
just finding the model with the best predictive capability on a specified
data set will *not* give you a good model in general. That's why approaches
such as AIC, corrected R^2, and so forth, include a penalty for model
complexity.

Unless I'm missing something really obvious, in which case I apologize. 

Ben Bolker

	[[alternative HTML version deleted]]

------------------------------------------------------------------------

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology