The three routines in R that calculate the wilcoxon signed-rank test give different p-values.......which is correct?

Peter Dalgaard · 2011-04-13T11:12:54Z

On Apr 13, 2011, at 01:57 , Michael G Rupert wrote: > I have a question concerning the Wilcoxon signed-rank test, and > specifically, which R subroutine I should use for my particular dataset. > There are three different commands in R (that I'm aware of) that calculate > the Wilcoxon signed-rank test; wilcox.test, wilcox.exact, and > wilcoxsign_test. When I run the three commands on the same dataset, I get > different p-values. I'm hoping that someone can give me guidance on the > strengt

Peter Dalgaard

Wed, Apr 13, 2011 4:12 AM

On Apr 13, 2011, at 01:57 , Michael G Rupert wrote:

Well, there are two version of zero-handling, and for each of these, you can have exact p values or asymptotic p values with or without continuity correction, so that's 6 possibilities already.

First, let me describe the dataset I am working with. The project I am 
working on collected water samples from groups/networks of about 30 water 
wells and analyzed them for nitrate, major ions, and other chemical 
constituents. We revisited those same wells about 10 years later and 
analyzed the water samples for the same chemical constituents. I now have 
a paired dataset, and the question I would like to answer is whether there 
was a "significant" change in concentrations of those chemical 
constituents (such as nitrate or chloride). Concentrations measured in 
water from some wells have increased, some have decreased, and some have 
stayed the same over the ten-year time period. In water from some wells, 
the concentrations were below the laboratory detection limits, so those 
concentrations are "tied" at the reporting level. The following is an 
example of the data I am evaluating. 

x <- c(13.60, 9.10, 22.01, 9.08, 1.97, 2.81, 0.66, 0.97, 0.21, 2.23, 0.08, 
0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 
0.06, 0.06, 3.44, 15.18, 5.25, 4.27, 17.81)
y <- c( 4.32, 3.39, 16.36, 7.10, 0.08, 2.02, 0.19, 0.59, 0.06, 2.15, 0.06, 
0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 
0.06, 0.06, 4.02, 16.13, 7.30, 7.98, 24.37)

The nonparametric Wilcoxon signed-rank test seems to be the most 
appropriate test for these data. There are two different methods to 
calculate the signed-rank test. The first is by Wilcoxon (1945), who 
discards any tied data and then calculates the signed ranks. The second 
method incorporates tied values in the ranking procedure (see J.W. Pratt, 
1959, Remarks on zeros and ties in the Wilcoxon signed rank procedure: 
Journal of the American Statistical Association, Vol. 54, No. 287, pp. 
655-667). There are two commands in R that calculate the original method 
by Wilcoxon (that I know of), wilcox.test and wilcoxsign_test (make sure 
to include the argument "zero.method = c("Wilcoxon")"). There are two 
other commands in R that incorporate ties in the signed-rank test, 
wilcox.exact and wilcoxsign_test (make sure to include the 
argument"zero.method = c("Pratt")").

Here's my problem. I get different p-values from each of the 4 signed-rank 
tests in R, and I don't know which one to believe. Wilcox.test and 
wilcoxsign_test(zero.method = c("Wilcoxon") calculate the standard 
Wilcoxon signed-rank test. Even though they are not designed to deal with 
tied data, they should at least calculate the same p-value, but they do 
not.

They do if you turn off the continuity correction in wilcox.test:

Wilcoxon signed rank test

data:  x and y 
V = 39, p-value = 0.05061
alternative hypothesis: true location shift is not equal to 0

So one does continuity correction and the other not.

They still handle zeros differently. wilcox.exact does not handle the Pratt ranking.

To get exact p values for Pratt ranks, try

1-sample Permutation Test

data:  c(-3, -4, -5, 6:11) 
T = 51, p-value = 0.08984
alternative hypothesis: true mu is not equal to 0 


... and for the asymptotic counterpart:

Asymptotic 1-sample Permutation Test

data:  c(-3, -4, -5, 6:11) 
T = 51, p-value = 0.08144
alternative hypothesis: true mu is not equal to 0

Not found. Apparently, you _constructed_ a data set to get the same set of ranks.

Apparently, the Pratt paper predates the convention that a p value is the probability of observing "the test statistic or more extreme" and he switches back and forth between "less than" and "less than or equal" (to a negative rank sum of 6 and 12 resp.). Also, his p-values are one-sided.

Using modern technology, it is pretty easy to generate the enumerations that Pratt is referring to:

0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 1  1  1  2  2  3  4  5  6  8  9 10 12 13 15 17 18 19 21 21 22 23 23 
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
23 23 22 21 21 19 18 17 15 13 12 10  9  8  6  5  4  3  2  2  1  1  1

0  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
 1  1  1  1  1  2  2  3  3  4  4  5  6  7  7  8 10 10 11 12 13 13 15 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 
15 16 16 17 17 18 17 17 18 17 17 16 16 15 15 13 13 12 11 10 10  8  7 
48 49 50 51 52 53 54 55 56 57 58 59 60 63 
 7  6  5  4  4  3  3  2  2  1  1  1  1  1

The three routines in R that calculate the wilcoxon signed-rank test give different p-values.......which is correct?

Thread (3 messages)