Skip to content

spearman rank correlation problem

3 messages · William T Morgan, Martin Maechler, Robert Baer

#
Hello R gurus,

I want to calculate the Spearman rho between two ranked lists. I am
getting results with cor.test that differ in comparison to my own
spearman function:

  > my.spearman
  function(l1, l2) {
    if(length(l1) != length(l2)) stop("lists must have same length")
    r1 <- rank(l1)
    r2 <- rank(l2)
    dsq <- sapply(r1-r2,function(x) x^2)
    1 - ((6 * sum(dsq)) / (length(l1) * (length(l1)^2 - 1)))
  }

Perhaps I'm doing something wrong in that code, but it's a pretty
straightforward calculation, so it's hard to see what, especially with
rank() handling the ties correctly. One example difference:

  > a
   [1]  0.112761940  0.130260949 -0.010567817 -0.411906701  0.004588443
   [6] -0.034337846 -0.148082981 -0.243724351  0.186690390  0.408983820
  > b
   [1]  8 13 14 15  5  7  8  2 19 19
  > cor.test(a,b,method="spearman")

  	Spearman's rank correlation rho

  data:  a and b 
  S = 85, p-value = 0.1544
  alternative hypothesis: true rho is not equal to 0 
  sample estimates:
        rho 
  0.4878139 

  Warning message: 
  p-values may be incorrect due to ties in: cor.test.default(a, b, method = "spearman") 
  > my.spearman(a,b)
  [1] 0.4909091

Which, as you can see, isn't quite the same. And also:

  > c
   [1] 0 0 0 0 0 0 0 0 0 0
  > cor.test(a,c,method="spearman")

  	Spearman's rank correlation rho

  data:  a and c 
  S = NA, p-value = NA
  alternative hypothesis: true rho is not equal to 0 
  sample estimates:
  rho 
   NA 

  Warning message: 
  The standard deviation is zero in: cor(x, y, na.method, method == "kendall") 
  > my.spearman(a,c)
  [1] 0.5

Any suggestions as to what I'm doing wrong?

Thanks in advance,
#
William> Hello R gurus,
    William> I want to calculate the Spearman rho between two ranked lists. I am
    William> getting results with cor.test that differ in comparison to my own
    William> spearman function:

    >> my.spearman
    William> function(l1, l2) {
    William>   if(length(l1) != length(l2)) stop("lists must have same length")
    William>   r1 <- rank(l1)
    William>   r2 <- rank(l2)
    William>   dsq <- sapply(r1-r2,function(x) x^2)
    William>   1 - ((6 * sum(dsq)) / (length(l1) * (length(l1)^2 - 1)))
    William> }

    William> Perhaps I'm doing something wrong in that code, but it's a pretty
    William> straightforward calculation, so it's hard to see what, especially with
    William> rank() handling the ties correctly. 

Well, the "ties" in your example are really the "problem".
The formula you use,  
    1 - 6 S(d^2) / (n^3 - n)    ( d = r1 - r2 ; r{12} := rank(x{12}) )
is only equal to the more natural definition,  
cor(r1, r2),  in the situation when there are no ties
[plus in a few "lucky" situations with ties].

cor.test() and now  cor(*, method = "spearman")  in R have always used
the correlation of the ranks.
It seems that this needs to be documented, since you are right,
the "1 - 6 S / (..)"  formula is also in use as *definition* for
Spearman's rank correlation.

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
#
Since this topic is here...

Although I seen the above formula in a couple of books, but I have never
seen a derivation.  Does anyone have a reference where it is derived
(apologies to math and statistics types for whom this must seem child's
play)?  Examining the derivation is probably the best way to understand the
effect of ties when using William Morgan's function for Spearman.

Rob Baer