Skip to content

R's Spearman

2 messages · Mendiburu, Felipe (CIP), Frank E Harrell Jr

#
Dear Ray,

The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R. 

Regards,

Felipe de Mendiburu
Statistician


# Spearman correlation "rs" with ties or no ties
rs<-function(x,y) {
d<-rank(x)-rank(y)
tx<-as.numeric(table(x))
ty<-as.numeric(table(y))
Lx<-sum((tx^3-tx)/12)
Ly<-sum((ty^3-ty)/12)
N<-length(x)
SX2<- (N^3-N)/12 - Lx
SY2<- (N^3-N)/12 - Ly
rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2))
return(rs)
}

# Aplicacion
[1] 0.2319084
[1] 0.2319084



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Raymond Wan
Sent: Monday, May 28, 2007 10:29 PM
To: r-help at stat.math.ethz.ch
Subject: [R] R's Spearman



Hi all,

I am trying to figure out the formula used by R's Spearman rho (using 
cor(method="spearman")) because I can't seem to get the same value as by 
calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know 
where.  Basically, I am running these commands:

 > y=read.table(file="tmp",header=TRUE,sep="\t")
 > y
   IQ Hours
1 106     7
2  86     0
3  97    20
4 113    12
5 120    12
6 110    17
 > cor(y[1],y[2],method="spearman")
       Hours
IQ 0.2319084

[it's an abbreviated example of one I took from Wikipedia].  I 
calculated by hand (apologies if the table looks strange when pasted 
into e-mail):

      IQ    Hours    rank(IQ)  rank(hours)    diff    diff^2
1    106    7        3         2                 1    1
2     86    0        1         1                 0    0
3     97    20       2         6                -4    16
4    113    12       5         3.5             1.5    2.25
5    120    12       6         3.5             2.5    6.25
6    110    17       4         5                -1    1
                                                      26.5
                           
                                              rho=    0.242857

where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the 
table and realized that the difference in result comes from ties.  i.e., 
if I remove the tie in rows 4 and 5, I get the same result from both cor 
and calculating by hand.  Perhaps I'm handling ties wrong...does anyone 
know how R does it or perhaps I need to change how I'm using it?

Thank you!

Ray

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Mendiburu, Felipe (CIP) wrote:
Just use midranks for ties (as with rank()) and compute the ordinary 
correlation on those (see also the spearman2 and rcorr functions in 
Hmisc package).  No need to use complex formulas.  And the t 
approximation for p-values works pretty well.

Frank Harrell