-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Jose Quesada
Sent: December-19-07 7:54 AM
To: r-help at lists.r-project.org
Subject: [R] Correlation when one variable has zero variance
(polychoric?)
Hi,
I'm running this for a simulation study, so many combinations of
parameter
produce many predictions that I need to correlate with data.
The problem
----------------
I'm using rating data with 3 to 5 categories (e.g., too low, correct,
too
high). The underlying continuous scales should be normal, so I chose
the
polychoric correlation. I'm using library(polychor) in its latest
version
0.7.4
The problem is that sometimes the models predict always the same value
(i.e., the median). Example frequency table:
table(med$ADRI_LAN, rate$ADRI_LAN)
2 3 4 5
3 28 179 141 50
That is, there is no variability in one of the variables (the only
value
is 3, the median).
Pearson Product Moment Correlation consists of the covariation divided
by
the square root of the product of the standard deviations of the two
variables. If the standard deviation of one of the variables is zero,
then
the denominator is zero and the correlation cannot be computed. R
returns
NA and a warning.
If I add jitter to the variable with no variability, then I get a
virtually zero, but calculable, Pearson correlation.
However, when I use the polychoric correlation (using the default
settings), I get just the opposite: a very high correlation!
polychoric = polychor( med$ADRI_LAN, rate$ADRI_LAN ) #, ML=T,
std.err=T
polychoric
[1] 0.999959
This is very counterintuitive.