Skip to content

Getting all possible contingency tables

13 messages · John Kane, Bert Gunter, Ben Bolker +4 more

#
Hello all,

Let say I have 2-way contingency table:

Tab <- matrix(c(8, 10, 12, 6), nr = 2)

and the Chi-squared test could not reject the independence:

 > chisq.test(Tab)

         Pearson's Chi-squared test with Yates' continuity correction

data:  Tab
X-squared = 1.0125, df = 1, p-value = 0.3143


However I want to get all possible contingency tables under this 
independence scenario (one of them would obviously be the given table 
as, we could not reject the independence), and for each such table I 
want to calculate the Ch-sq statistic.

Can somebody help me how to generate all such tables?

Thanks and regards,
#
Are you basically asking for all possible permutations of the table?  If so see ?permn in the combinat package.

John Kane
Kingston ON Canada
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!
#
Thanks John for your reply. However still not clear how I should proceed.

My goal is to generate all possible contingency tables. Basically I want 
to see the distribution of Chi-squared Statistic under independence (NULL).

So I was thinking if I can generate all possible permutation of integer 
numbers having sum equal to (8 + 10 + 12 + 6) = 36. Is there any R 
function to do that?

Thanks and regards,
On 01-12-2012 18:39, John Kane wrote:
#
R can usually do almost anything but if the function exists I am not aware of it.  Sorry. I did misunderstand what you were doing.

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
#
Thanks Bert for your reply.

I am trying to understand/visualize the Sample Chi-Squared Statistic's 
Null distribution. Therefore I need all possible Contingency tables 
under Independence case.

What could be better way to visualize that?

Thanks and regards,
On 01 December 2012 20:03:00, Bert Gunter wrote:
#
You will need to be clear about whether you are conditioning on
the marginal totals as well as on the overall total. As stated,
you are only asking for the overall total (36) to be fixed.

In that case, one possible (and by no means unique) approach
would be to:

[A]: Choose any four "random integers" a,b,c,d that add up to 36
(even there, you still have to make a choice about what distribution
to adopt for the "random integers").
[B]: Place the results into the 2x2 matrix; then evaluate chi-squared.
[C]: Repeat until you have enough cases.

Example (using equiprobable multinomial to generate 4 "random integers")

  Tab <- matrix(rmultinom(1,36,c(1,1,1,1)/4), nrow=2)

A more specific choice would be to fix the row and column probablilties
(but not the sample row and column totals), e.g.:

  P.row1 <- 0.25 ; P.row2 <- 1 - P.row1
  P.col1 <- 0.50 ; P.col2 <- 1 - P.col1

Then, adopting the hypothesis of independence between rows and columns:

  P11 <- P.row1*P.col1
  P21 <- P.row2*P.col1
  P12 <- P.row1*P.col2
  P22 <- P.row2*P.col2

  Tab <- matrix(rmultinom(1,36,c(P11,P21,P12,P22)), nrow=2)

On the other hand, if you want to also fix the sample row margins
and column margins, then your sampled table needs to be generated
using a hypergeometric distribution, (which is the basis of the
fisher.test mentioned by Bert Gunter). Since explaining how to
do this is a bit mjore complicated than the above, please first
confirm what constraints (e.g. only total count; row-margins
only; col-margins only; both row- & col-margins) you wich to impose.

Hoping this helps,
Ted.
On 01-Dec-2012 14:28:24 Christofer Bogaso wrote:
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 01-Dec-2012  Time: 16:00:16
This message was sent by XFMail
#
I think R *can* do this (thanks to Robin Hankin):

  library(partitions)
  str(parts(36))
 'partition' int [1:36, 1:17977] 36 0 0 0 0 0 0 0 0 0 ...

I'm not quite clear on how you're going to take these results
and turn them into possible tables, but I guess you do ...
You might also be interested in the simulate.p.value option to
chisq.test and the randomizer (which preserves row and column
totals): from the code of chisq.test,

           tmp <- .Call(C_chisq_sim, sr, sc, B, E)
#
On Dec 1, 2012, at 7:28 AM, Christofer Bogaso wrote:

            
In your quest for enlightenment, consider looking at the example in:

?r2dtable
#
Thanks Ted (and other) for your suggestion. Here I have implemented 
following:

Tab <- matrix(c(8, 10, 12, 6), nr = 2)

Simu_Number <- 50000
Tab_Simulate <- vector("list", length = Simu_Number)
for (i in 1:Simu_Number) {
         Tab_Simulate[[i]] <- matrix(rmultinom(1, sum(Tab), rep(0.25, 
4)), nrow = 2)   ## All Cells have equal probability
     }
Sample_ChiSq <- sapply(Tab_Simulate, function(x) {
                                 Statistic <-
sum((chisq.test(as.table(x))$observed - 
chisq.test(as.table(x))$expected)^2/chisq.test(as.table(x))$expected)
                                 return(Statistic)
                             })

length(Sample_ChiSq[Sample_ChiSq < qchisq(0.95, 1)])/Simu_Number

hist(Sample_ChiSq, freq = FALSE)
lines(dchisq(seq(min(Sample_ChiSq), max(Sample_ChiSq), by = 0.5), 1))


However I think I am making some serious mistake as histogram did not 
match the density curve.

Can somebody help me where I am making mistake?

Thanks and regards,
On 01-12-2012 21:45, (Ted Harding) wrote:
#
On 02-Dec-2012 14:17:20 Christofer Bogaso wrote:
The reasons for the mis-match are:

A: that you have put the curve in the wrong place, by not
supplying x-coordinates to lines(), so that it plots its
points at x = 1,2,3,4,...

B: that you need to multiply the plotted density by the width
of the histogram cells, so as to match the density curve to the
discrete density of the histogram. It will also then look better
when the chis-squared curve is plotted at the mid-points of the cells.

Hence, try something like:

hist(Sample_ChiSq, freq = FALSE, breaks=0.5*(0:40))
x0 <- 0.25+0.5*(0:20)
lines(x0,dchisq(x0,1))

Hoping this helps,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 02-Dec-2012  Time: 15:02:45
This message was sent by XFMail
#
Thanks Ted for your correction. I was depressed thinking that I did not 
understand the theory. However now it comes as just a plotting mistake!

Thanks,
On 02 December 2012 20:47:48, (Ted Harding) wrote: