Skip to content
Prev 336797 / 398513 Next

association of multiple variables

Below is a somewhat more general version of David's function,
which allows a choice of the association statistic from
vcd::assocstats().  Of course, only Cramer's V is calculated
on a scale of 0-1 for an absolute-value measure of strength
of association, but this could be accommodated by scaling to
diagonals = 1.

The OP specified binary variables, so tetrachoric correlations
might be more appropriate here. John Fox's polycor package
provides a more general approach to this problem, including
polychoric and polyserial correlations, as well as a hetcor()
function to calculate correlation-like measures for mixtures
of different variable types, all providing standard errors
and therefore the possibility to compute p-values.

catcor <- function(x, type=c("cramer", "phi", "contingency")) {
	require(vcd)
	nc <- ncol(x)
	v <- expand.grid(1:nc, 1:nc)
	type <- match.arg(type)
	res <- matrix(mapply(function(i1, i2) assocstats(table(x[,i1],
		x[,i2]))[[type]], v[,1], v[,2]), nc, nc)
	rownames(res) <- colnames(res) <- colnames(x)
	res
}

e.g.

dat <- data.frame(
  v1=sample(LETTERS[1:5], 15, replace=TRUE),
  v2=sample(LETTERS[1:5], 15, replace=TRUE),
  v3=sample(LETTERS[1:5], 15, replace=TRUE))

 > catcor(dat, type="phi")
          v1       v2       v3
v1 2.000000 1.073675 0.942809
v2 1.073675 2.000000 1.105542
v3 0.942809 1.105542 2.000000
 > catcor(dat, type="cramer")
           v1        v2        v3
v1 1.0000000 0.5368374 0.4714045
v2 0.5368374 1.0000000 0.5527708
v3 0.4714045 0.5527708 1.0000000
 > catcor(dat, type="contingency")
           v1        v2        v3
v1 0.8944272 0.7317676 0.6859943
v2 0.7317676 0.8944272 0.7416198
v3 0.6859943 0.7416198 0.8944272
 >
On 2/18/2014 9:38 AM, David Carlson wrote: