Question About Syntax For Complex ANOVA Design - R-SIG-ecology

Mon, Nov 10, 2008 10:48 AM #

On Mon, Nov 10, 2008 at 9:22 AM, Mike Dunbar <mdu at ceh.ac.uk> wrote:

But if you drop the term you are effectively spending your degrees of
freedom twice - once to estimate the effect that you drop, and then
again in the new model.  Another way of to see the problem is to think
about the null distribution of the p-values - if you only include
significant p values in your model, the standard null hypothesis is
clearly not appropriate.

I think there's a good discussion of this in Frank Harrell's
regression modelling strategies, but unfortunately I don't have a copy
on hand to point you to the exact location.

Hadley

http://had.co.nz/

Ben Bolker

Mon, Nov 10, 2008 12:02 PM #

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hadley wickham wrote:

you wouldn't retain coast:MBL if it's not significant, as you lose
degrees of freedom,

and this gets worse the more terms and the more interactions you consider.

See e.g. sections 4.2 through 4.4 (pp. 56-60).  The discussion
above does not mean that overfitted models are good, or that there
isn't a penalty to overspecifying models (or otherwise one would
always throw everything into the models), but that data-driven
model selection has some very fundamental problems ...

  cheers
   Ben Bolker

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkYk1MACgkQc5UpGjwzenOcvgCePr2fJx+GfV++s6Q14pQe/Ryj
vf8An2Gxc3SCzsCHj7x53yOXAx/NZng4
=Os6f
-----END PGP SIGNATURE-----

Hadley Wickham

Mon, Nov 10, 2008 2:57 PM #

On Mon, Nov 10, 2008 at 2:02 PM, Ben Bolker <bolker at ufl.edu> wrote:

But of course, not using data when selecting models has some pretty
fundamental problems too! ;)

Hadley

http://had.co.nz/