2008/7/15 Simon Blomberg <s.blomberg1 at uq.edu.au>:
On Tue, 2008-07-15 at 13:43 +0800, Julie Marsh wrote:
Given that I am using 2 different tests for two different hypotheses I
still would have expected these p-values to be more similar.
Well, as Pinheiro and Bates say in their book (worth reading!), the LRT
for mixed effects models is anti-conservative. So your LRT p-value is
almost certainly too small. The posterior p-value might be more
accurate, if you accept the usual caveats re: priors and convergence
etc. Also, when calculating p-values by hand using pchisq, you should
probably use pchisq(..., lower.tail=FALSE) instead of 1-pchisq(...),
which is inaccurate. The log.p option might also be useful if you really
need to compare small probabilities. And why were you using pchisq with
0 df (which always == 1)? I don't understand that at all.
Regarding the mixture of a chi-square distribution, this is more
appropriate when testing a single variance component. The ordinary
test with one df is conservative, since the test is on the boundary of
the parameter space. But Julie is not testing a variance component, so
the mixture is not appropriate here.
I can think of three reasons, that the p-values Julie obtains are
different in the likelihood ratio test and the posterior sampling.
1) the extensive use of control parameters indicates that convergence
may be an issue. If one or both the models have not reached
convergence, obviously the likelihood ratio test is based on wrong
likelihoods and will be misleading. I suppose the MCMC sampling will
also be inappropriate in this situation. The need for control
parameters could also indicate that some problems are related to the
data structure such as severe unbalance. Perhaps there is too little
information on some parameters and convergence is hard to achieve?