Question About Syntax For Complex ANOVA Design
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hadley wickham wrote:
On Mon, Nov 10, 2008 at 9:22 AM, Mike Dunbar <mdu at ceh.ac.uk> wrote:
(apologies - I should have written coast * MBL not ML) I'm not sure of my ground here, but surely do lose something -
you wouldn't retain coast:MBL if it's not significant, as you lose degrees of freedom, and this gets worse the more terms and the more interactions you consider.
But if you drop the term you are effectively spending your degrees of freedom twice - once to estimate the effect that you drop, and then again in the new model. Another way of to see the problem is to think about the null distribution of the p-values - if you only include significant p values in your model, the standard null hypothesis is clearly not appropriate. I think there's a good discussion of this in Frank Harrell's regression modelling strategies, but unfortunately I don't have a copy on hand to point you to the exact location. Hadley
See e.g. sections 4.2 through 4.4 (pp. 56-60). The discussion above does not mean that overfitted models are good, or that there isn't a penalty to overspecifying models (or otherwise one would always throw everything into the models), but that data-driven model selection has some very fundamental problems ... cheers Ben Bolker -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkkYk1MACgkQc5UpGjwzenOcvgCePr2fJx+GfV++s6Q14pQe/Ryj vf8An2Gxc3SCzsCHj7x53yOXAx/NZng4 =Os6f -----END PGP SIGNATURE-----