p-values vs likelihood ratios
On Mon, Feb 21, 2011 at 9:24 AM, Ben Bolker <bbolker at gmail.com> wrote:
?I don't see why you're using AIC differences here.
My understanding it that taking the difference of the values resulting from AIC() is equivalent to computing the likelihood ratio then applying the AIC correction to account for the different number of parameters in each model (then log-transforming at the end). My original exposure to likelihood ratios (and the AIC/BIC correction thereof) comes from Glover & Dixon (2004, http://www.psych.ualberta.ca/~pdixon/Home/Preprints/EasyLRms.pdf), who describe the raw likelihood ratio as inappropriately favoring the model with more parameters because more complex models have the ability to fit noise more precisely than less complex models. Hence application of some form of correction to account for the differential complexity of the models being compared. I wonder, however, whether cross validation might be a less controversial approach to achieving fair comparison of two models that differ in parameter number. That is, fit the models to a subset of the data, then compute the likelihoods on another subset of the data. I'll play around with this idea and report back any interesting findings...
? If one is really trying to test for "evidence of an effect" I see nothing wrong with a p-value stated on the basis of the null distribution of deviance differences between a full and a reduced model - -- it's figuring out that distribution that is the hard part. If I were doing this in a Bayesian framework I would look at the credible interval of the parameters (although doing this for multi-parameter effects is harder, which is why some MCMC-based "p values" have been concocted on this list and elsewhere).
We'll possibly have to simply disagree on the general utility of p-values for cumulative science (as opposed to one-off decision making). I do, however, agree that Bayesian credible intervals have a role in cumulative science insofar as they permit a means of relative evaluation of models that differ not in the presence of an effect but in the specific magnitude of the effect, as may be encountered in more advanced/fleshed-out areas of inquiry. Otherwise, in the context of areas where the simple existence of an effect is of theoretical interest, computing credible intervals on effects seems like overkill and have (from my anti-p perspective) a dangerously easy connection to null-hypothesis significance testing.