Mann-Whitney & Wilcoxon Rank Sum - R-help

Sun, May 15, 2005 9:37 PM #

Hello,

I am hoping someone could shed some light into the Wilcoxon Rank Sum Test 
for me?  In looking through Stats references, the Mann-Whitney U-test and 
the Wilcoxon Rank Sum Test are statistically equivalent.  When using the 
following dataset:

m <- c(2.0863,2.1340,2.1008,1.9565,2.0413,NA,NA)
f <- c(1.8938,1.9709,1.8613,2.0836,1.9485,2.0630,1.9143)

and the wilcox.test command as below:

wilcox.test(m,f, paired = FALSE, alternative = c("two.sided"))

I get a test statistic (W) of 30.  When I perform this test by hand 
utilizing the methodology laid out in Ch. 6 of Ott & Longnecker I get a 
value of 45.  Any insight or good reference(s) as to the algorithm R is 
using or this issue in general would be most appreciated.

Thanks....

Brian Ripley

Sun, May 15, 2005 10:52 PM #

On Mon, 16 May 2005, Jim BRINDLE wrote:

Yes, but not numerically: they differ by a constant (in the data, a 
function of the data size).

I don't know that book but the R help page does have references.  Also, 
?pwilcox says

      This distribution is obtained as follows.  Let 'x' and 'y' be two
      random, independent samples of size 'm' and 'n'. Then the Wilcoxon
      rank sum statistic is the number of all pairs '(x[i], y[j])' for
      which 'y[j]' is not greater than 'x[i]'.  This statistic takes
      values between '0' and 'm * n', and its mean and variance are 'm *
      n / 2' and 'm * n * (m + n + 1) / 12', respectively.

Your samples have length 5 (after removing NAs) and 7 and no ties.  The R 
code is readable by

as essentially

 	r <- rank(c(x,y))
 	sum(r[seq(along = x)]) - n.x * (n.x + 1)/2

I guess your reference just uses the first term.

Another way of looking at this is whether ranks start at 0 or at 1 (as in 
rank()): R's definition is the rank sum with 0-based ranks.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Peter Dalgaard

Sun, May 15, 2005 11:34 PM #

Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

Er, no... Then you'd be subtracting n.x. The definition is such that
the minimum value of the statistic becomes zero (2nd term is equal to
sum(1:n.x)).

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907