The two chisq.test p values differ when the contingency table is transposed! (PR#3486) - R-devel

Duncan Murdoch · 2003-08-15T16:41:49Z

>Date: Wed, 16 Jul 2003 01:27:25 +0200 (MET DST) >From: shitao@ucla.edu >> x > [,1] [,2] >[1,] 149 151 >[2,] 1 8 >> c2x > for(i in (1:20)){c2x simulate.p.value=T,B=100000)$p.value)} >> c2tx > for(i in (1:20)){c2tx + B=100000)$p.value)} >> cbind(c2x,c2tx) > c2x c2tx > [1,] 0.03711 0.01683 > [2,]

Duncan Murdoch

Fri, Aug 15, 2003 9:41 AM #

The problem is in ctest/R/chisq.test.R, where the p-value is
calculated as 

            STATISTIC <- sum((x - E) ^ 2 / E)
            PARAMETER <- NA
            PVAL <- sum(tmp$results >= STATISTIC) / B

Here tmp$results is a collection of simulated chisquare values, but
because of different rounding, the statistics corresponding to tables
equal to the observed table are slightly smaller than the value
calculated in STATISTIC, and effectively the p-value is calcuated as

             PVAL <- sum(tmp$results > STATISTIC) / B

instead.

What's the appropriate fix here? 

PVAL <- sum(tmp$results > STATISTIC - .Machine$double.eps^0.5) / B

works on this example, but is there something better?

Duncan Murdoch

Kurt Hornik

Thu, Aug 21, 2003 1:20 PM #

Argh.  Very interesting ...

I think it works to use

            STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))

instead: this starts by summing the big values, and hence if at all
slightly 'underestimates' the real value, which is fine for the
comparisons.

Fix committed to r-devel.  Thanks for looking into this.

-k

Duncan Murdoch

Thu, Aug 21, 2003 5:52 PM #

On Thu, 21 Aug 2003 20:16:24 +0200, you wrote:

Looks good here.

Duncan