Skip to content

Mann-Whitney & Wilcoxon Rank Sum

3 messages · Jim BRINDLE, Brian Ripley, Peter Dalgaard

#
Hello,

I am hoping someone could shed some light into the Wilcoxon Rank Sum Test 
for me?  In looking through Stats references, the Mann-Whitney U-test and 
the Wilcoxon Rank Sum Test are statistically equivalent.  When using the 
following dataset:

m <- c(2.0863,2.1340,2.1008,1.9565,2.0413,NA,NA)
f <- c(1.8938,1.9709,1.8613,2.0836,1.9485,2.0630,1.9143)

and the wilcox.test command as below:

wilcox.test(m,f, paired = FALSE, alternative = c("two.sided"))

I get a test statistic (W) of 30.  When I perform this test by hand 
utilizing the methodology laid out in Ch. 6 of Ott & Longnecker I get a 
value of 45.  Any insight or good reference(s) as to the algorithm R is 
using or this issue in general would be most appreciated.

Thanks....
#
On Mon, 16 May 2005, Jim BRINDLE wrote:

            
Yes, but not numerically: they differ by a constant (in the data, a 
function of the data size).
I don't know that book but the R help page does have references.  Also, 
?pwilcox says

      This distribution is obtained as follows.  Let 'x' and 'y' be two
      random, independent samples of size 'm' and 'n'. Then the Wilcoxon
      rank sum statistic is the number of all pairs '(x[i], y[j])' for
      which 'y[j]' is not greater than 'x[i]'.  This statistic takes
      values between '0' and 'm * n', and its mean and variance are 'm *
      n / 2' and 'm * n * (m + n + 1) / 12', respectively.

Your samples have length 5 (after removing NAs) and 7 and no ties.  The R 
code is readable by
as essentially

 	r <- rank(c(x,y))
 	sum(r[seq(along = x)]) - n.x * (n.x + 1)/2

I guess your reference just uses the first term.

Another way of looking at this is whether ranks start at 0 or at 1 (as in 
rank()): R's definition is the rank sum with 0-based ranks.
#
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
Er, no... Then you'd be subtracting n.x. The definition is such that
the minimum value of the statistic becomes zero (2nd term is equal to
sum(1:n.x)).