Skip to content

The two chisq.test p values differ when the contingency table is transposed! (PR#3486)

3 messages · Duncan Murdoch, Kurt Hornik

#
The problem is in ctest/R/chisq.test.R, where the p-value is
calculated as 

            STATISTIC <- sum((x - E) ^ 2 / E)
            PARAMETER <- NA
            PVAL <- sum(tmp$results >= STATISTIC) / B

Here tmp$results is a collection of simulated chisquare values, but
because of different rounding, the statistics corresponding to tables
equal to the observed table are slightly smaller than the value
calculated in STATISTIC, and effectively the p-value is calcuated as

             PVAL <- sum(tmp$results > STATISTIC) / B

instead.

What's the appropriate fix here? 

PVAL <- sum(tmp$results > STATISTIC - .Machine$double.eps^0.5) / B

works on this example, but is there something better?

Duncan Murdoch
6 days later
#
Argh.  Very interesting ...

I think it works to use

            STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))

instead: this starts by summing the big values, and hence if at all
slightly 'underestimates' the real value, which is fine for the
comparisons.

Fix committed to r-devel.  Thanks for looking into this.

-k
#
On Thu, 21 Aug 2003 20:16:24 +0200, you wrote:

            
Looks good here.

Duncan