[R-meta] Testing multicollinearity between categorical predictors

Fri, Jun 19, 2020 8:52 AM

I don't have a reference, but one doesn't need one for this anyway. A factor with two levels is just a dummy variable. That is computationally indistinguishable from a "continuous" predictor that just happens to take on the values 0 and 1. So, the VIFs will be the same whether we regard this as a factor or as a continuous variable. We can also just examine this by example:

library(metafor)

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)

dat$random <- ifelse(dat$alloc == "random", 1, 0)
dat$far    <- ifelse(dat$ablat >= 35, 1, 0)

res <- rma(yi, vi, mods = ~ random*far, data=dat)
res
vif(res)

res <- rma(yi, vi, mods = ~ factor(random)*factor(far), data=dat)
res
vif(res)

Now the more interesting aspect here is that we don't actually have to use 0/1 coding for the factor. We could, for example, also use +-1 coding. This won't change the significance of the interaction term, although it does change the meaning of the "main effects":

dat$random <- ifelse(dat$alloc == "random", 1, -1)
dat$far    <- ifelse(dat$ablat >= 35, 1, -1)
res <- rma(yi, vi, mods = ~ random*far, data=dat)
res

However, this coding can reduce the VIFs quite a bit:

vif(res)

because the correlation between the variables is much lower now:

with(dat, cor(cbind(random, far, random*far)))

But this also shows that the usefulness of VIFs is questionnable, especially for interaction terms. Again, the significance of the interaction term is the same and it would be regardless of how high the VIF is even with 0/1 coding.

Best,
Wolfgang

-----Original Message-----
From: Rafael Rios [mailto:biorafaelrm at gmail.com]
Sent: Friday, 19 June, 2020 17:33
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Testing multicollinearity between categorical
predictors

Thank you very much, Wolfgang. Do you have a reference supporting this
approach? It will be very helpful.

Best wishes,

Rafael.

Em sex., 19 de jun. de 2020 ?s 12:03, Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:
In that case, you can just use vif(). The 'generalized VIF' is only relevant
when a factor variable has more than two levels and one wants to compute a
VIF that pertains to the whole factor, not just each of the individual dummy
variable. But if the factor only has two levels, then there is only one
dummy variable, so this is the same as GVIF.

Best,
Wolfgang

-----Original Message-----
From: Rafael Rios [mailto:biorafaelrm at gmail.com]
Sent: Friday, 19 June, 2020 16:35
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Testing multicollinearity between categorical
predictors

Dear Wolfgang,

Yes, it is. Yes and no for each moderator. I am also evaluating their
interaction.

All the best,

Rafael.

Em sex., 19 de jun. de 2020 ?s 10:12, Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:
I am not sure I fully understand. Are you saying that the two moderators
have two levels each?

Best,
Wolfgang

-----Original Message-----
From: Rafael Rios [mailto:biorafaelrm at gmail.com]
Sent: Friday, 19 June, 2020 15:02
To: Michael Dewey
Cc: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Testing multicollinearity between categorical
predictors

Dear Michael,

Thank you for the reply. I am evaluating the biases arising from pooling
samples from different populations and periods on the average effect size.
Therefore, I included both pooling practices, and their interaction as
moderators. Each practice has two levels (yes and no).

Best wishes,

Rafael.

Em sex., 19 de jun. de 2020 ?s 09:50, Michael Dewey
<lists at dewey.myzen.co.uk> escreveu:
Dear Rafael

It is hard to answer here because we do not know what scientific problem
the referee thinks he or she has spotted which would be solved by such a
test. Being of a cynical world view I suspect neither does the referee
and this is a conditioned reflex like Pavlov's dog salivating at the bell.

Are the two moderators of scientific interest to you or are you
including them so you can say that there is still residual heterogeneity
even after you did your best to explain it? In the latter case I would
suggest collinearity is irrelevant.

Michael

On 19/06/2020 13:36, Rafael Rios wrote:

Dear Wolfgang,

Thank you for the replay. I also thought about using VIF to evaluate
multicollinearity, but there is a lot of criticism about the applicability
of VIF for categorical predictors. There is a variation called GVIF.
However, since the meta-analysis changes categorical predictors to dummy
variables, I could not use it in R. I am not sure whether this is the best
approach. Do you not other methods to evaluate or avoid potential
multicollinearity among categorical moderators?

Best wishes,

Rafael.

Em sex., 19 de jun. de 2020 ?s 05:40, Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:

Dear Rafael,

I don't know what "testing" for multicollinearity would entail. One could
examine the variance inflation factors with vif(). What VIF values are
considered "large" is debatable though.

Best,
Wolfgang

[R-meta] Testing multicollinearity between categorical predictors

Thread (10 messages)