cor and missing values. Bug?
On 27 May 2004 00:20:17 +0200
Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
"Robert W. Baer, Ph.D." <rbaer at atsu.edu> writes:
Not to put too fine a point on it, but did you consider checking the NEWS file for the most recent version (1.9.0, http://cran.r-project.org/src/base/NEWS)? o The cor() function did not remove missing values in the non-Pearson case.
There is still something a little strange in version 1.9.0. What is the source of the discrpancy between cor() and cor.test()?
One ranks x and y before removing missing values, the other one
removes them first and then ranks. It is not really desirable, but a
better solution is nontrivial (esp. in the "pairwise.complete.obs"
case) and we did document it in ?cor:
Notice also that the ranking is (currently) done
removing only cases that are missing on the variable itself,
which may not be what you expect if you let 'use' be
'"complete.obs"' or '"pairwise.complete.obs"'.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Some of you may want to look at the old rcorr function in the Hmisc
package, which uses the pairwise complete obs method, uses some C code for
Spearman correlation, and is fast for large matrices.
Frank
---
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University