Skip to content

p-values

6 messages · Peter Ho, Spencer Graves, Torsten Hothorn

#
HI R-users,

I am trying to repeat an example from Rayner and Best "A contingency 
table approach to nonparametric testing (Chapter 7, Ice cream example).

In their book they calculate Durbin's statistic, D1, a dispersion 
statistics, D2, and a residual. P-values for each statistic is 
calculated from a chi-square distribution and also Monte Carlo p-values.

I have found similar p-values based on the chi-square distribution by 
using:

 > pchisq(12, df= 6, lower.tail=F)
[1] 0.0619688
 > pchisq(5.1, df= 6, lower.tail=F)
[1] 0.5310529

Is there a way to calculate the equivalent Monte Carlo p-values?

The values were 0.02 and 0.138 respectively.

The use of the approximate chi-square probabilities for Durbin's test 
are considered not good enough according to Van der Laan (The American 
Statistician 1988,42,165-166).


Peter
--------------------------------
ESTG-IPVC
2 days later
#
Hi, Peter:

	  Please see my reply of a few minutes ago subject:  exact 
goodness-of-fit test.  I don't know Rayner and Best, but the same 
method, I think, should apply.  spencer graves
Peter Ho wrote:

            

  
    
1 day later
#
Spencer,


Thank you for referring me to your other email on Exact goodness-of-fit 
test. However, I'm not entirely sure if what you mentioned is the same 
for my case. I'm not a statistician and it would help me if you could 
explain what you meant in a little more detail. Perhaps I need to 
explain the problem in more detail.

I am looking for a way to calculate exaxt p-values by Monte Carlo 
Simulation for Durbin's test. Durbin's test statistic is similar to 
Friedman's statistic, but considers the case of Balanced Incomplete 
block designs. I have found a function written by Felipe de Mendiburu 
for calculating Durbin's statistic, which gives the chi-squared p-value. 
I have also been read an article by Torsten Hothorn "On exact rank Tests 
in R" (R News 1(1), 11?12.) and he has shown how to calculate Monte 
Carlo p-values using pperm. In the article by Torsten Hothorn he gives:

R> pperm(W, ranks, length(x))

He compares his method to that of StatXact, which is the program Rayner 
and Best suggested using. Is there a way to do this for example for the 
friedman test.

A paper by Joachim Rohmel discusses "The permutation distribution for 
the friendman test" (Computational Statistics & Data Analysis 1997, 26: 
83-99). This seems to be on the lines of what I need, although I am not 
quite sure. Has anyone tried to recode his APL program for R?

I have tried a number of things, all unsucessful. Searching through 
previous postings have not been very successful either. It seems that 
pperm is the way to go, but I would need help from someone on this.

Any hints on how to continue would be much appreciated.


Peter
Spencer Graves wrote:

            
2 days later
#
pperm seems reasonable, though I have not looked at the details.

	  We should be careful about terminology, however.  So-called "exact 
p-values" are generally p-values computed assuming a distribution over a 
finite set of possible outcomes assuming some constraints to make the 
outcome space finite.  For example, Fisher's exact test for a 2x2 table 
assumes the marginals are fixed.

	  I don't remember the details now, but I believe there is literature 
claiming that this may not be the best thing to do when, for example, 
when it is reasonable to assume that the number in each cell is Poisson. 
  In such cases, you may lose statistical power by conditioning on the 
marginals.  I hope someone else will enlighten us both, because I'm not 
current on the literature in this area.

	  The situation with "exact tests" and "exact p-values" is not nearly 
as bad as with so-called ""exact confidence intervals", which promise to 
deliver at least the indicated coverage probability.  With discrete 
distributions, it is known that 'Approximate is better than "exact' for 
interval estimation of binomial proportions', as noted in an article of 
this title by A. Agresti and B. A. Coull (1998) American Statistician, 
52:  119-126.  (For more on this particular issue, see Brown, Cai and 
Dasgupta 2003 "Interval Estimation in Exponential Families", Statistica 
Sinica 13:  19-49).

	  If this does not answer your question adequately, may I suggest you 
try the posting guide.  People report having found answers to difficult 
questions in the process of preparing a question following that guide, 
and when they do post a question, they are much more likely to get a 
useful reply.

	  spencer graves
Peter Ho wrote:

            

  
    
#
Spencer,

Here is an example from rayner and best 2001 and the script sent by 
Felipe.  This can be done as follows using the function durbin.grupos() 
in the attached file

 > ###Ice cream example from Rayner and Best 2001 . Chapter 7
 > judge <- rep(c(1:7),rep(3,7))
 > variety <- c(1,2,4,2,3,5,3,4,6,4,5,7,1,5,6,2,6,7,1,3,7)
 > cream <- c(2,3,1,3,1,2,2,1,3,1,2,3,3,1,2,3,1,2,3,1,2)
 > durbin.grupos(judge,variety,cream,k=3,r=3,alpha=0.01)

Prueba de Durbin
..............
Chi Cuadrado :  12
Gl.          :  6
P-valor      :  0.0619688
..............
Comparación de tratamientos

Alpha        :  0.01
Gl.          :  8
t-Student    :  3.355387
Diferencia minima
para la diferencia entre suma de rangos =  4.109493

Grupos, Tratamientos y la Suma de sus rangos
a        2       9
ab       1       8
abc      7       7
abc      6       6
abc      5       5
 bc      3       4
  c      4       3
  trat prom   M
1    2    9   a
2    1    8  ab
3    7    7 abc
4    6    6 abc
5    5    5 abc
6    3    4  bc
7    4    3   c
 >                              

You can see that the p-value is the same with

 > pchisq(12, df= 6, lower.tail=F)
[1] 0.0619688

I am hoping that someone, maybe Torsten, might be able to suggest how I 
can incorporate Monte-Carlo p-values using pperm().  The statistical 
issues are beyond my comprehension and I assume that Rayner and Best 
suggestion to use Monte-Carlo p-values instead of Chi-square p-values to 
be correct. In the above example the Monte-Carlo p-value is 0.02. This 
is a significant difference, resulting in the rejection of the null 
hypothesis when using Monte-Carlo p-values.

I hope this example might help. Thanks again for your answer and also to 
Felipe for sending the function for Durbin's test.



Peter
Spencer Graves wrote:

            
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Durbin.r
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050811/17206e5f/Durbin.pl
4 days later
#
On Thu, 4 Aug 2005, Peter Ho wrote:

            
Hi Peter,

when I understand the example correctly, the main interest is testing
independence of the judges' ranking and the ice cream brand, where the
judges are interpreted as `blocks' using a chi^2-type statistic based on
the rank sums for each ice cream. In R:

ice <- data.frame(judge = factor(rep(c(1:7),rep(3,7))),
                  variety = factor(c(1,2,4,2,3,5,3,4,6,4,5,7,1,5,6,2,6,7,1,3,7)),
                  rank = c(2,3,1,3,1,2,2,1,3,1,2,3,3,1,2,3,1,2,3,1,2))
library("coin")
it <- independence_test(rank ~ variety | judge, data = ice, teststat = "quadtype")
it

        Asymptotic General Independence Test

data:  rank by
         groups 1, 2, 3, 4, 5, 6, 7
         stratified by judge
T = 12, df = 6, p-value = 0.06197

So without having checked the theory exactly, this looks like being
Dubin's D1 statistic with _asymptotic conditional p-value_ (please have a
look at coin's vignette which explains what happens here).

The Monte-Carlo p-value can now be computed by 99,999 replications:

pvalue(independence_test(rank ~ variety | judge, data = ice,
       teststat = "quadtype", distribution = approximate(B = 99999)))

[1] 0.01778018
99 percent confidence interval:
 0.01672170 0.01888482

which seems to be a little bit smaller than 0.02.

Hope that helps,

Torsten