You are right but I was just trying to stick to the same example.
In reality it would be ok as long as its an ordered factor. One could
restrict it to those of class "ordered".
On Dec 3, 2007 1:58 PM, Liaw, Andy <andy_liaw at merck.com> wrote:
I'd call that another infelicity. Species is supposed to be nominal,
not ordinal, so rank correlation wouldn't make much sense. So what does
cor(, method="kendall") do? It looks like it simply uses the underlying
numeric code. (Change Species to numerics and you'll see the same
answer.) However, reordering the levels changes the result:
R> iris2 <- iris
R> levels(iris2$Species) <- levels(iris2$Species)[c(2, 1, 3)]
R> cor(iris2, method = "kendall")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Sepal.Length 1.00000000 -0.07699679 0.7185159 0.6553086 0.1897778
Sepal.Width -0.07699679 1.00000000 -0.1859944 -0.1571257 0.1439793
Petal.Length 0.71851593 -0.18599442 1.0000000 0.8068907 0.2677154
Petal.Width 0.65530856 -0.15712566 0.8068907 1.0000000 0.2724843
Species 0.18977778 0.14397927 0.2677154 0.2724843 1.0000000
To me, this is dangerous!
Andy
From: Gabor Grothendieck
You can calculate the Kendall rank correlation with such a matrix
so you would not want to exclude factors in that case:
cor(iris, method = "kendall")
Sepal.Length Sepal.Width Petal.Length
Petal.Width Species
Sepal.Length 1.00000000 -0.07699679 0.7185159
0.6553086 0.6704444
Sepal.Width -0.07699679 1.00000000 -0.1859944
-0.1571257 -0.3376144
Petal.Length 0.71851593 -0.18599442 1.0000000
0.8068907 0.8229112
Petal.Width 0.65530856 -0.15712566 0.8068907
1.0000000 0.8396874
Species 0.67044444 -0.33761438 0.8229112
0.8396874 1.0000000
On Dec 3, 2007 9:27 AM, Michael Friendly <friendly at yorku.ca> wrote:
In using cor(data.frame), it is annoying that you have to explicitly
filter out non-numeric columns, and when you don't, the
Error in cor(iris) : missing observations in cov/cor
In addition: Warning message:
In cor(iris) : NAs introduced by coercion
It would be nicer if stats:::cor() did the equivalent
following for a data.frame:
> cor(iris[,sapply(iris, is.numeric)])
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
A change could be implemented here:
if (is.data.frame(x))
x <- as.matrix(x)
Second, the default, use="all" throws an error if there are any
NAs. It would be nicer if the default was use="complete.cases",
which would generate warnings instead. Most other statistical
software is more tolerant of missing data.
> library(corrgram)
> data(auto)
> cor(auto[,sapply(auto, is.numeric)])
Error in cor(auto[, sapply(auto, is.numeric)]) :
missing observations in cov/cor
> cor(auto[,sapply(auto, is.numeric)],use="complete")
# works; output elided
-Michael