'simulate.p.value' for goodness of fit
I'm afraid I can't follow your examples, but you seem to me to be mixing contingency table tests and goodness of fit tests in a somewhat incoherent fashion. Note that your ``x2()'' function does a contingency table test, and not a goodness of fit test. Note that in chisq.test(), if ``x'' is a matrix, then the ``p'' argument is ignored. For a goodness of fit test, the sampling in chisq.test ***is*** multinomial. It uses the sample() function, as a quick glance at the code would have told you. I haven't the time to plough through your code and figure out what you're driving at, but I think that part of your problem could be the degrees of freedom. Under the contingency table test the degrees of freedom are 1; under the goodness of fit test the degrees of freedom are 3. (The vector of probabilities is *known* under the g.o.f. test, not estimated.) cheers, Rolf Turner
On 18/01/2008, at 7:59 AM, Bob wrote:
R Help on 'chisq.test' states that
"if 'simulate.p.value' is 'TRUE', the p-value is computed by Monte
Carlo simulation with 'B' replicates.
In the contingency table case this is done by random sampling
from
the set of all contingency tables with given marginals, and works
only if the marginals are positive...
In the goodness-of-fit case this is done by random sampling from
the discrete distribution specified by 'p', each sample being of
size 'n = sum(x)'."
The last paragraph suggests that in the goodness-of-fit case, if p
gives the
expected probability for each cell, this random sampling is
multinomial.
Unfortunately, as the following examples reveal, the sampling model is
neither
multinomial nor hypergeometric - at least when it is applied to a 4-
fold
table.
This is rather sad as some people assume that R's chisq.test
function can
perform a Monte Carlo test of X-squared, employing a multinomial
model - in
other words, assuming that your data are a single random sample.
### Example 1.
x=matrix(c(1,2,3,4),nc=2) # To begin with, let us apply the large-sample approximations chisq.test(x,correct=TRUE)$p.value
[1] 0.6726038 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x, correct = TRUE)
chisq.test(x,correct=FALSE)$p.value
[1] 0.7781597 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x, correct = FALSE)
# So let us apply a 2-tailed test of O.R.=1, using a hypergeometric model fisher.test(x)$p.value
[1] 1
# This should also apply a hypergeometric model chisq.test(x,simulate.p.value=TRUE,B=500000)$p.value
[1] 1
# Now we work out the expected probability for each cell p=outer(c(sum(x[1,]),sum(x[2,])),c(sum(x[,1]),sum(x[,2])))/sum(x)^2 # But this applies a hypergeometric model, presumably because p is not scalar chisq.test(x,p=p,simulate.p.value=TRUE,B=500000)$p.value
[1] 1
# This seems to do something different, # at any rate it is much slower, and needs more memory chisq.test(x[1:4],p=p[1:4],simulate.p.value=TRUE,B=10000)$p.value
[1] 1
# Which would appear to be using the same model as above # Now let us apply an X2 test using a multinomial model # (The code for this x2.test function is in Appendix 1, below.) x2.test(x,R=200000)
with cc P = 0.7316812 conventional-P = 0.8838786 mid-P = 0.8423058
# All of these P-values are higher than those given by the Chi- squared
approximation,
# but they certainly do not equal 1. # But is this is an artefact of our very small sample? ### Example 2. # Let us try a larger sample x=matrix(c(56,35,23,42),nc=2) # First we apply the asymptotic model chisq.test(x,correct=TRUE)$p.value
[1] 0.002222425
chisq.test(x,correct=FALSE)$p.value
[1] 0.001276595
# Now for the hypergeometric (fixed margin totals model) fisher.test(x)$p.value
[1] 0.001931078
chisq.test(x,simulate.p.value=TRUE,B=500000)$p.value
[1] 0.001913996
p=outer(c(sum(x[1,]),sum(x[2,])),c(sum(x[,1]),sum(x[,2])))/sum(x)^2 chisq.test(x,p=p,simulate.p.value=TRUE,B=500000)$p.value
[1] 0.001891996
Next comes what we had hoped to be a multinomial test chisq.test(x[1:4],p=p[1:4],simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01639836
# This is obviously not the same hypergeometric model as used for a > #
chi-squared test.
# The P-value is about 10x of the approximate tests (above) # or the exact tests (below). x2.test(x,R=200000)
with cc P = 0.002059990 conventional-P = 0.001184994 mid-P = 0.001172494
# Whatever that chi-squared test model IS, it is certainly not multinomial! # Could it possibly be Poisson and, if so, why???
######## Appendix 1:
# We have used these functions to do a 2x2 multinomial test of X2:
x2=function(y,cc=FALSE){
y=y*1.;n=sum(y);C=cc*n/2
a=y[1];b=y[2];c=y[3];d=y[4]
ab=a+b;cd=c+d;ac=a+c;bd=b+d
D=ab*cd*ac*bd
if(D==0)x2=NA else x2=n*(abs(a*d-b*c)-C)^2/D
x2}
x2.test=function(x,R=5000){
n=sum(x)
p=outer(c(sum(x[1,]),sum(x[2,])),c(sum(x[,1]),sum(x[,2])))/n/n
Q=sort(apply(rmultinom(R,n,p),2,x2))
q=x2(x,cc=TRUE)
pl=rank(c(q,Q),ties.method='max')[1]/(length(Q)+1)
pe=sum(c(q,Q)==q)/(length(Q)+1);pu=1-pl+pe
cat('with cc P = ',pu,'\n')
q=x2(x)
pl=rank(c(q,Q),ties.method='max')[1]/(length(Q)+1)
pe=sum(c(q,Q)==q)/(length(Q)+1);pu=1-pl+pe
cat('conventional-P = ',pu,'\n')
cat('mid-P = ',pu-pe/2,'\n')}
Bob
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}