numerical issues in chisq.test(simulate=TRUE) (PR#8224)
Yes, only the middle matrix was problematic. Others were included to show what the value should (approx) be. Sorry that I didn't mention I was using 32 bit. There are of course very easy fixes to this, just wasn't sure what was the "best" approach in this situation (i.e. wasn't sure if the usual tolerance of sqrt(.Machine$double.eps) was too large. Thanks much, Doug
On Thu, 20 Oct 2005, Martin Maechler wrote:
Thank you, Douglas (and Simone) for the bug report.
"Simone" == Simone Giannerini <sgiannerini at gmail.com>
on Thu, 20 Oct 2005 10:10:01 +0200 writes:
Simone> Hi,
Simone> I obtain the same result under Win. XP SP2 on AMD 64 3700+
Simone> platform i386-pc-mingw32
Simone> arch i386
Simone> os mingw32
Simone> system i386, mingw32
Simone> status
Simone> major 2
Simone> minor 2.0
Simone> year 2005
Simone> month 10
Simone> day 06
Simone> svn rev 35749
Simone> language R
>> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value
Simone> [1] 0.3598201
>> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value
Simone> [1] 0.0004997501
>> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value
Simone> [1] 0.3403298 So, it's only the middle matrix giving problems for you. It doesn't for me on a AMD 64-bit platform, even for 1000 simulations:
m <- cbind(1:0, c(7,16)) summary(p <- replicate(1000, chisq.test(m, sim=TRUE)$p.value))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3043 0.3268 0.3338 0.3337 0.3405 0.3663
but it does show the problem on 32-bit one.
A fix is easy and will be in R-patched (and R-devel of
course) soon.
Martin Maechler, ETH Zurich
Simone> Ciao
Simone> Simone
Simone> On 10/20/05, dgrove at fhcrc.org <dgrove at fhcrc.org> wrote:
>> Hi,
>>
>> This report deals with p-values coming from chisq.test using
>> the simulate.p=TRUE option. The issue is numerical accuracy
>> and was brought up in previous bug reports 3486 and 3896.
>> The bug was considered fixed but apparently was only mostly
>> fixed. Just the typical problem of two values that are
>> mathematically equal not ending up numerically equivalent.
>>
>> Consider this series of three 2x2 tables:
>>
>> [1,] 1 7
>> [2,] 0 15
>>
>> [1,] 1 7
>> [2,] 0 16
>>
>> [1,] 1 7
>> [2,] 0 17
>>
>>
>> The pvals returned from chisq.test(m, sim=TRUE)$p.value are
>> 0.3543228, 0.0004997501 and 0.3273363 respectively.
>>
>> The 2nd seems a bit unlikely, huh?
>>
>> I checked into it and the value I'm getting for the statistic
>> (calculated in R code) is 4*.Machine$double.eps less than the
>> value (which should be equal) that is returned from the C-code
>> that does the simulation.
>>
>>
>> Code for creating/testing the three matrices shown above:
>> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value
>> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value
>> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value
>>
>>
>> Running SuSE9.3 on a AMD Athlon4000+
>>
>>
>> > version
>> platform i686-pc-linux-gnu
>> arch i686
>> os linux-gnu
>> system i686, linux-gnu
>> status Patched
>> major 2
>> minor 1.1
>> year 2005
>> month 07
>> day 29
>> language R
>>
>>
>> Thanks,
>> Doug
>>
>>
>> Douglas Grove
>> Statistical Research Associate
>> Fred Hutchinson Cancer Research Center
>> Seattle WA 98109