Hello,
I am hoping someone could shed some light into the Wilcoxon Rank Sum Test
for me? In looking through Stats references, the Mann-Whitney U-test and
the Wilcoxon Rank Sum Test are statistically equivalent. When using the
following dataset:
m <- c(2.0863,2.1340,2.1008,1.9565,2.0413,NA,NA)
f <- c(1.8938,1.9709,1.8613,2.0836,1.9485,2.0630,1.9143)
and the wilcox.test command as below:
wilcox.test(m,f, paired = FALSE, alternative = c("two.sided"))
I get a test statistic (W) of 30. When I perform this test by hand
utilizing the methodology laid out in Ch. 6 of Ott & Longnecker I get a
value of 45. Any insight or good reference(s) as to the algorithm R is
using or this issue in general would be most appreciated.
Thanks....
Mann-Whitney & Wilcoxon Rank Sum
3 messages · Jim BRINDLE, Brian Ripley, Peter Dalgaard
On Mon, 16 May 2005, Jim BRINDLE wrote:
Hello, I am hoping someone could shed some light into the Wilcoxon Rank Sum Test for me? In looking through Stats references, the Mann-Whitney U-test and the Wilcoxon Rank Sum Test are statistically equivalent.
Yes, but not numerically: they differ by a constant (in the data, a function of the data size).
When using the
following dataset:
m <- c(2.0863,2.1340,2.1008,1.9565,2.0413,NA,NA)
f <- c(1.8938,1.9709,1.8613,2.0836,1.9485,2.0630,1.9143)
and the wilcox.test command as below:
wilcox.test(m,f, paired = FALSE, alternative = c("two.sided"))
I get a test statistic (W) of 30. When I perform this test by hand
utilizing the methodology laid out in Ch. 6 of Ott & Longnecker I get a
value of 45. Any insight or good reference(s) as to the algorithm R is
using or this issue in general would be most appreciated.
I don't know that book but the R help page does have references. Also,
?pwilcox says
This distribution is obtained as follows. Let 'x' and 'y' be two
random, independent samples of size 'm' and 'n'. Then the Wilcoxon
rank sum statistic is the number of all pairs '(x[i], y[j])' for
which 'y[j]' is not greater than 'x[i]'. This statistic takes
values between '0' and 'm * n', and its mean and variance are 'm *
n / 2' and 'm * n * (m + n + 1) / 12', respectively.
Your samples have length 5 (after removing NAs) and 7 and no ties. The R
code is readable by
getAnywhere("wilcox.test.default")
as essentially r <- rank(c(x,y)) sum(r[seq(along = x)]) - n.x * (n.x + 1)/2 I guess your reference just uses the first term. Another way of looking at this is whether ranks start at 0 or at 1 (as in rank()): R's definition is the rank sum with 0-based ranks.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
as essentially r <- rank(c(x,y)) sum(r[seq(along = x)]) - n.x * (n.x + 1)/2 I guess your reference just uses the first term. Another way of looking at this is whether ranks start at 0 or at 1 (as in rank()): R's definition is the rank sum with 0-based ranks.
Er, no... Then you'd be subtracting n.x. The definition is such that the minimum value of the statistic becomes zero (2nd term is equal to sum(1:n.x)).
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907