Skip to content
Back to formatted view

Raw Message

Message-ID: <49189353.10003@ufl.edu>
Date: 2008-11-10T20:02:27Z
From: Ben Bolker
Subject: Question About Syntax For Complex ANOVA Design
In-Reply-To: <f8e6ff050811101048v134e526fhb8c44841eff7318@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hadley wickham wrote:
> On Mon, Nov 10, 2008 at 9:22 AM, Mike Dunbar <mdu at ceh.ac.uk> wrote:
>> (apologies - I should have written coast * MBL not ML)
>>
>> I'm not sure of my ground here, but surely do lose something - 

you wouldn't retain coast:MBL if it's not significant, as you lose
degrees of freedom,

and this gets worse the more terms and the more interactions you consider.
> 
> But if you drop the term you are effectively spending your degrees of
> freedom twice - once to estimate the effect that you drop, and then
> again in the new model.  Another way of to see the problem is to think
> about the null distribution of the p-values - if you only include
> significant p values in your model, the standard null hypothesis is
> clearly not appropriate.
> 
> I think there's a good discussion of this in Frank Harrell's
> regression modelling strategies, but unfortunately I don't have a copy
> on hand to point you to the exact location.
> 
> Hadley

  See e.g. sections 4.2 through 4.4 (pp. 56-60).  The discussion
above does not mean that overfitted models are good, or that there
isn't a penalty to overspecifying models (or otherwise one would
always throw everything into the models), but that data-driven
model selection has some very fundamental problems ...

  cheers
   Ben Bolker

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkYk1MACgkQc5UpGjwzenOcvgCePr2fJx+GfV++s6Q14pQe/Ryj
vf8An2Gxc3SCzsCHj7x53yOXAx/NZng4
=Os6f
-----END PGP SIGNATURE-----