Dear Michael,
I had some time this evening, so I programmed the two-step procedure
described in the article that I mentioned. This isn't the ML estimate of the
correlation, but apparently it performs reasonably well and it is quite
fast. I checked the function on some examples and it appears to work
correctly. I'd be curious to see how the this function compares with the one
that you received.
A more complete solution would take a data frame and, as appropriate,
calculate polychoric, polyserial, and product-moment correlations among its
columns. I don't think that writing a polyserial-correlation function would
be any more difficult. Perhaps I'll add this to the sem package; I hesitate
to do so because the resulting standard errors and likelihoods from sem()
won't be right.
I'm taking the liberty of copying this reply to the r-help list, since the
question was originally raised there. I hope that's OK.
Regards,
John
-------------- snip ------------
polychor <- function (x, y){
# x: a contingency table of counts or an ordered categorical variable
# y: if x is a variable, a second ordered categorical variable
# returns the polychoric correlation for the table
# or between x and y, using the two-step approximation
# to the ML estimate described in F. Drasgow, "Polychoric and
# polyserial correlations," in S. Kotz and N. Johnson, eds.
# The Encyclopedia of Statistics, Volume 7, New York: Wiley,
# 1986.
f <- function(rho) {
P <- matrix(0, r, c)
R <- matrix(c(1, rho, rho, 1), 2, 2)
for (i in 1:r){
for (j in 1:c){
P[i,j] <- pmvnorm(lower=c(row.cuts[i], col.cuts[j]),
upper=c(row.cuts[i+1], col.cuts[j+1]),
corr=R)
}
}
- sum(tab * log(P))
}
tab <- if (missing(y)) x else table(x, y)
r <- nrow(tab)
c <- ncol(tab)
n <- sum(tab)
row.cuts <- c(-Inf, qnorm(cumsum(rowSums(tab))/n))
col.cuts <- c(-Inf, qnorm(cumsum(colSums(tab))/n))
optimise(f, interval=c(0, 1))$minimum
}
-----Original Message----- From: Michael Dewey [mailto:m.dewey at iop.kcl.ac.uk] Sent: Monday, November 29, 2004 1:38 PM To: John Fox Subject: RE: [R] Tetrachoric and polychoric ceofficients (for sem) - any tips? At 18:07 28/11/04, you wrote:
Dear Michael, I'm not aware of pre-existing R code for tetrachoric or polychoric correlations. I may at some point incorporate such functions
into the
sem package but I don't have concrete plans for doing so. On
the other
hand, I don't think that it would be very hard to do so. (A
discussion,
references, and an example are in Kotz and Johnson, eds.,
Encyclopedia
of Statistics, Vol 7.)
Someone has kindly emailed me some code for the polychoric case (which he says is quite slow) I will try it out, it may be better to treat the 2 by 2 case separately. If it is OK I plan to write an equivalent to cor for matrices of tetrachoric/polychoric. It will not happen any time soon now, but if I do manage t would you be interested in it for sem? As you say there remains the problem that they are estimates and introduce further uncertainties.
Tetrachoric and polychoric correlations are estimates, and in subsequently estimating a SEM from these, one should take
that into account. Michael Dewey m.dewey at iop.kcl.ac.uk