Skip to content
Back to formatted view

Raw Message

Message-ID: <Pine.A41.4.58.0404091032330.61772@homer32.u.washington.edu>
Date: 2004-04-09T19:43:12Z
From: Thomas Lumley
Subject: Incorrect handling of NA's in cor() (PR#6750)
In-Reply-To: <20040409172243.B026310476@slim.kubism.ku.dk>

On Fri, 9 Apr 2004 msa@biostat.mgh.harvard.edu wrote:

>
> Dear Uwe,
>
> You are wrong. First, I've read the help file before
> submitting the report. For two variables,
> use="pairwise.complete.obs" and use="complete.obs" should be
> equivalent, shouldn't it? Of sourse, the results will be
> different when we have more than 2 variables. Second, with the
> call you proposed I am also getting incorrect result:
>

I think it's more complicated than either of you are considering.

For the Pearson correlation everything is straightforward, and
pairwise.complete is the same as complete, which is the same as dropping
the NAs manually.

For the rank correlations the question is when the ranking should be done.
The cor() function ranks the observations and then drops missing values,
the manual approach drops missing values and then ranks.

I'm not convinced that it is obvious which of these is right, though
certainly the help page should document whichever is being done.


	-thomas