Skip to content

cor(..., method="spearman") or cor(..., method="kendall") (PR#6641)

2 messages · JRitter@hhh.umn.edu, Peter Dalgaard

#
Dear R maintainers,

R is great.  Now that I have that out of the way, I believe I have
encountered a bug, or at least an inconsistency, in how Spearman and
Kendall rank correlations are handled.  Specifically, cor() and
cor.test() do not produce the same answer when the data contain NAs.

cor() treats the NAs as data, while cor.test() eliminates them.  The
option
use="complete.obs" has no effect on cor() with method="s" or "k".

An illustration follows.  I'm running R for Windows, version 1.81 on a

Pentium 4, Windows 2000.

Regards,

Joe Ritter

#===================================================================

   > x = c(1,2,NA,3,5,88,NA)
   > y = c(3,8,4,7,1,12,NA)
   > cor(x,y,method="s")
   [1] 0.4642857
   > cor(x,y,method="s",use="c")
   [1] 0.4642857
   > cor.test(x,y,method="s")

           Spearman's rank correlation rho

   data:  x and y S = 14, p-value = 0.6833 alternative hypothesis:
true
   rho is not equal to 0 sample estimates:
   rho
   0.3

   > cor(na.omit(data.frame(x,y)),method="s",use="c")
       x   y
   x 1.0 0.3
   y 0.3 1.0
   > cor(x,y,method="k")
   [1] 0.3333333
   > cor(x,y,method="k",use="complete.obs")
   [1] 0.3333333
   > cor.test(x,y,method="k")

           Kendall's rank correlation tau

   data:  x and y
   T = 6, p-value = 0.8167
   alternative hypothesis: true tau is not equal to 0
   sample estimates:
   tau
   0.2
#
JRitter@hhh.umn.edu writes:
We know... PR#6448 is the same thing. (The problem is that rank()
follows sort() which by default sorts NA's to the end of the sorted
vector. Thus NA's get a high rank and if both x and y have NA at the
same time, a high spearman correlation is calculated.) It is fixed in
the patched R version and also in the development sources (soon to be
1.9.0).