Tetrachoric and polychoric ceofficients (for sem) - any tips? - R-help

John Fox
Mon, Nov 29, 2004 6:25 PM #
Dear Michael,

I had some time this evening, so I programmed the two-step procedure
described in the article that I mentioned. This isn't the ML estimate of the
correlation, but apparently it performs reasonably well and it is quite
fast. I checked the function on some examples and it appears to work
correctly. I'd be curious to see how the this function compares with the one
that you received.

A more complete solution would take a data frame and, as appropriate,
calculate polychoric, polyserial, and product-moment correlations among its
columns. I don't think that writing a polyserial-correlation function would
be any more difficult. Perhaps I'll add this to the sem package; I hesitate
to do so because the resulting standard errors and likelihoods from sem()
won't be right.

I'm taking the liberty of copying this reply to the r-help list, since the
question was originally raised there. I hope that's OK.

Regards,
 John 


-------------- snip ------------

polychor <- function (x, y){
# x: a contingency table of counts or an ordered categorical variable
# y: if x is a variable, a second ordered categorical variable
# returns the polychoric correlation for the table
#  or between x and y, using the two-step approximation
#  to the ML estimate described in F. Drasgow, "Polychoric and
#  polyserial correlations," in S. Kotz and N. Johnson, eds.
#  The Encyclopedia of Statistics, Volume 7, New York: Wiley,
#  1986.
  f <- function(rho) {
    P <- matrix(0, r, c)
    R <- matrix(c(1, rho, rho, 1), 2, 2)
    for (i in 1:r){
      for (j in 1:c){
        P[i,j] <- pmvnorm(lower=c(row.cuts[i], col.cuts[j]),
                          upper=c(row.cuts[i+1], col.cuts[j+1]),
                          corr=R)
        }
      }
     - sum(tab * log(P))
    }
  tab <- if (missing(y)) x else table(x, y)
  r <- nrow(tab)
  c <- ncol(tab)
  n <- sum(tab)
  row.cuts <- c(-Inf, qnorm(cumsum(rowSums(tab))/n))
  col.cuts <- c(-Inf, qnorm(cumsum(colSums(tab))/n))
  optimise(f, interval=c(0, 1))$minimum
  }