Incorrect handling of NA's in cor() (PR#6750)
Marek Ancukiewicz wrote:
Dear Uwe, You are wrong.
Whoops. My apologies!!! In R-1.9.0 beta I get: cor(x[!is.na(x)&!is.na(y)],y[!is.na(x)&!is.na(y)],method="s") # [1] -0.4 cor(x,y,use="complete.obs", method="s") # [1] -0.5291503 I'll take a look! Uwe
First, I've read the help file before submitting the report. For two variables, use="pairwise.complete.obs" and use="complete.obs" should be equivalent, shouldn't it? Of sourse, the results will be different when we have more than 2 variables. Second, with the call you proposed I am also getting incorrect result:
cor(x, y, use="pairwise.complete.obs", method="s")
[1] -0.1428571 The correct result is -0.4, as correctly calculated by cor.test() Regards Marek Ancukiewicz
X-Original-To: msa@biostat.mgh.harvard.edu Date: Fri, 09 Apr 2004 19:06:47 +0200 From: Uwe Ligges <ligges@statistik.uni-dortmund.de> Organization: Fachbereich Statistik, Universitaet Dortmund X-Accept-Language: en-us, en, de-de, de Cc: R-bugs@biostat.ku.dk msa@biostat.mgh.harvard.edu wrote:
Full_Name: Marek Ancukiewicz Version: 1.8.1 OS: Linux Submission from: (NULL) (132.183.12.87) Function cor() incorrectly handles missing observation with method="spearman":
x <- c(1,2,3,NA,5,6) y <- c(4,NA,2,5,1,3) cor(x,y,use="complete.obs",method="s")
[1] -0.1428571
cor(x[!is.na(x)&!is.na(y)],y[!is.na(x)&!is.na(y)],method="s")
[1] -0.4 These two results should be the same.
No! Please read at least the help file, ?cor, before submitting a bug report: "If use is "complete.obs" then missing values are handled by casewise deletion. Finally, if use has the value "pairwise.complete.obs" then the correlation between each pair of variables is computed using all complete pairs of observations on those variables." Hence cor(x, y, use="pairwise.complete.obs", method="s") is what you expect ... Uwe Ligges