same old question - lme4 and p-values
On Sun, Apr 6, 2008 at 9:05 PM, David Henderson
<dnadave at revolution-computing.com> wrote:
Hi John:
> For all practical purposes, a CI is just the Bayesian credible > interval that one gets with some suitable "non-informative prior". > Why not then be specific about the prior, and go with the Bayesian > credible interval? (There is an issue whether such a prior can > always be found. Am right in judging this no practical consequence?)
What? Could you explain this a little more? There is nothing Bayesian about a classical (i.e. not Bayesian credible set or highest posterior density, or whatever terminology you prefer) CI. The interpretation is completely different, and the assumptions used in deriving the interval are also different. Even though the interval created when using a noninformative prior is similar to a classical CI, they are not the same entity.
Now, while i agree with the arguments about p-values and their validity, there is one aspect missing from this discussion. When creating a general use package like lme4, we are trying to create software that enables statisticians and researchers to perform the statistical analyses they need and interpret the results in ways that HELP them get published. While I admire Doug for "drawing a line in the sand" in regard to the use of p-values in published research, this is counter to HELPING the researcher publish their results. There has to be a better way to further your point in the community than FORCING your point upon them. Education of the next generation of researchers and journal editors is admittedly slow, but a much more community friendly way of getting your point used in practice.
Perhaps I should clarify. The summary of a fitted lmer model does not provide p-values because I don't know how to calculate them in an acceptable way, not because I am philosophically opposed to them. The estimates and the approximate standard errors can be readily calculated as can their ratio. The problem is determining the appropriate reference distribution for that ratio from which to calculate a p-value. In fixed-effects models (under the "usual" assumptions) that ratio is distributed as a T with a certain number of degrees of freedom. For mixed models it is not clear exactly what distribution it has - except in certain cases of completely balanced data sets (i.e. the sort of data sets that occur in text books). At one time I used a T distribution and an upper bound on the degrees of freedom but I was persuaded that providing p-values that could be strongly "anti-conservative" is worse than not providing any. That decision not to provide p-values is particularly inconvenient to many users who are not especially interested in statistical niceties but do need to satisfy editors or referees who want to see p-values. I know that is a real problem. My earlier comment about having created a monster that now turns on us, which touched off this line of discussion, was more about the fact that we try to take complex analyses and reduce the conclusions from them to a single number, the p-value. We can provide considerable information about the models that are fit to the experimenter's data but without p-values the experimenter may be unable to publish the results. The approach that I feel is most likely to be successful in summarizing these models is first to obtain the REML or ML estimates of the parameters then to run a Markov chain Monte Carlo sampler to assess the variability in the parameters (or, if you prefer, the variability in the parameter estimators). (Note: I am not advocating using MCMC to obtain the estimates, I suggest MCMC for assessing the variability.) The current version of the mcmcsamp function suffers from the practical problem that it gets stuck at near-zero values of variance components. There are some approaches to dealing with that. Over the weekend I thought that I had a devastatingly simple way of dealing with such cases until I reflected on it a bit more and realized that it would require a division by zero. Other than that, it was a good idea. The practical problem with the mcmcsamp function at present is th
Just my $0.02... Dave H -- David Henderson, Ph.D. Director of Community REvolution Computing 1100 Dexter Avenue North, Suite 250 206-577-4778 x3203 DNADave at Revolution-Computing.Com http://www.revolution-computing.com
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models