Skip to content
Prev 2226 / 5636 Next

[R-meta] Testing multicollinearity between categorical predictors

I don't have a reference, but one doesn't need one for this anyway. A factor with two levels is just a dummy variable. That is computationally indistinguishable from a "continuous" predictor that just happens to take on the values 0 and 1. So, the VIFs will be the same whether we regard this as a factor or as a continuous variable. We can also just examine this by example:

library(metafor)

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)

dat$random <- ifelse(dat$alloc == "random", 1, 0)
dat$far    <- ifelse(dat$ablat >= 35, 1, 0)

res <- rma(yi, vi, mods = ~ random*far, data=dat)
res
vif(res)

res <- rma(yi, vi, mods = ~ factor(random)*factor(far), data=dat)
res
vif(res)

Now the more interesting aspect here is that we don't actually have to use 0/1 coding for the factor. We could, for example, also use +-1 coding. This won't change the significance of the interaction term, although it does change the meaning of the "main effects":

dat$random <- ifelse(dat$alloc == "random", 1, -1)
dat$far    <- ifelse(dat$ablat >= 35, 1, -1)
res <- rma(yi, vi, mods = ~ random*far, data=dat)
res

However, this coding can reduce the VIFs quite a bit:

vif(res)

because the correlation between the variables is much lower now:

with(dat, cor(cbind(random, far, random*far)))

But this also shows that the usefulness of VIFs is questionnable, especially for interaction terms. Again, the significance of the interaction term is the same and it would be regardless of how high the VIF is even with 0/1 coding.

Best,
Wolfgang