Skip to content

Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

3 messages · Massimo Bressan, Peter Dalgaard

#
Given this example

#start code

a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
    
760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)

b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
     3220,490,20790,290,740,5350,940,3910,0,640,850,260)

wilcox.test(a, b, paired=FALSE)

#sum of rank for first sample
sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a
W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2
W1

U1 <- length(a)*length(b)/2-W1
U1

#sum of ranks for second sample
sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b
W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2 
W2

U2 <- length(a)*length(b)/2-W2
U2

#end code

And given the fact that:

- in the note of R Wilcox.test is clearly stated: ? The literature is not
unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney
tests. The two most common definitions correspond to the sum of the ranks of
the first sample with the minimum value subtracted or not. R subtracts [?.],
giving a value which is larger by m(m+1)/2 for a first sample of size m?

- as result of the same test performed with different stat packages (i.e.
STATISTICA and PAST) I?ve got an U value of 200.5 as in W2 (see my script)
with the same p-value

What can I conclude regarding STATISTICA and PAST packages?... are they
giving W2 (see my script) instead of U?

A crucial point is that the variant of the algorithm used for computation by
the packages is very rarely indicated in the output or documented in the
help facility and the manuals.
See also this link (I?ve found after a long meandering on the web) about the
comparison of ?wilcoxon mann whitney? u test outcomes from different stat
packages: 
http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 

Any of you have faced the same type of issues? Or am I completely wrong?

maxbre

--
View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html
Sent from the R help mailing list archive at Nabble.com.
#
On May 29, 2012, at 17:55 , maxbre wrote:

            
NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.
Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.

  
    
#
On May 29, 2012, at 17:55 , maxbre wrote:

            
NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.
Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.