Correlation when one variable has zero variance (polychoric?)
Dear Jose,
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Jose Sent: December-19-07 11:27 AM To: r-help at stat.math.ethz.ch Subject: Re: [R] Correlation when one variable has zero variance (polychoric?) Dear John,
I also ran the same analysis in 2005 (what has changed in the package polycor since then, I don't know)
and the
results were different. I think back then I contrasted them with SAS and they were the same.
John> I don't entirely follow this. Are you referring to the table above with one John> row, more generally to table with zero marginals, or to tables in which John> there are interior zeroes?> I have plenty of those tables, but I think quite a few of them have zero marginals (the case I posted might be a bit extreme). I have 400 observations, so no matter how centered the distributions are, some observations will be out of the center.
As I said, there's no basis for estimating polychoric correlations and all thresholds when there are zero marginals. If there is more than one row and column remaining with nonzero marginals, then you could simply eliminate the rows/columns with zero marginals, but tables with only one nonzero row or column have no information about the correlation. I'll think about doing this -- i.e., removing zero rows and columns -- automatically and issuing a warning.
The results I got in 2005 cannot be reproduced now in 2007 with the same code; I guess this could be due to this bug you describe (maybe it was introduced later?). In 2007, I got many correlations has high as the one I described and I was wondering what the problem was. I don't have SAS available anymore so I cannot run the code I wrote in SAS to compare.
No program, not even SAS, can magically estimate a correlation from a table with one row or column. If polychor() did that in 2005, the answer it provided was erroneous.
Where can I get the new code for polychor?
I plan to upload a new version of the polycor package to CRAN as soon as I have a chance -- probably sometime this week. But you already have the code for polychor() and can modify it yourself: Just fix the test so that it checks for < 2 rather than < 1 row, and return NA (and issue a warning) in this case.
I'm in a predicament here; the data I'm analyzing are from a flight simulation and are extremely expensive to get, so running more experiments is out of question. Any pointers as to how I could analyze this dataset? (i.e. one where there might be zero marginals?)
I'm sorry, but as I said there's no magic solution here. The data, however expensive, don't have information relevant to estimating the correlation. Regards, John
Thanks -Jose
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.