Skip to content

[R-meta] Testing multicollinearity between categorical predictors

10 messages · Wolfgang Viechtbauer, Michael Dewey, Rafael Rios

#
Dear Wolfgang and All,

Is there a good method for testing multicollinearity between categorical
predictors in meta-regression? I ran a mixed-effects MLMA using two
categorical predictors and their interaction as moderators, but a Referee
requested a test of multicollinearity. I did not find a good approach to
solve this problem. Thank you in advance.

Best wishes,
_______________________________________________________

*Prof. Dr. Rafael Rios Moura*

*sciencia amabilis *
Coordenador de Pesquisa e do NEPEE/CNPq
Laborat?rio de Ecologia e Zoologia (LEZ)
UEMG - Unidade Ituiutaba

ORCID: http://orcid.org/0000-0002-7911-4734
Curr?culo Lattes: http://lattes.cnpq.br/4264357546465157
<http://orcid.org/0000-0002-7911-4734>
<http://lattes.cnpq.br/4264357546465157>
<http://lattes.cnpq.br/4264357546465157>Research Gate:
https://www.researchgate.net/profile/Rafael_Rios_Moura2
<http://orcid.org/0000-0002-7911-4734>
Rios de Ci?ncia: https://www.youtube.com/channel/UCu2186wIJKji22ai8tvlUfg
<http://orcid.org/0000-0002-7911-4734>
2 days later
#
Dear Rafael,

I don't know what "testing" for multicollinearity would entail. One could examine the variance inflation factors with vif(). What VIF values are considered "large" is debatable though.

Best,
Wolfgang
#
Dear Wolfgang,

Thank you for the replay. I also thought about using VIF to evaluate
multicollinearity, but there is a lot of criticism about the applicability
of VIF for categorical predictors. There is a variation called GVIF.
However, since the meta-analysis changes categorical predictors to dummy
variables, I could not use it in R. I am not sure whether this is the best
approach. Do you not other methods to evaluate or avoid potential
multicollinearity among categorical moderators?

Best wishes,

Rafael.
_______________________________________________________

*Prof. Dr. Rafael Rios Moura*

*sciencia amabilis *
Coordenador de Pesquisa e do NEPEE/CNPq
Laborat?rio de Ecologia e Zoologia (LEZ)
UEMG - Unidade Ituiutaba

ORCID: http://orcid.org/0000-0002-7911-4734
Curr?culo Lattes: http://lattes.cnpq.br/4264357546465157
<http://orcid.org/0000-0002-7911-4734>
<http://lattes.cnpq.br/4264357546465157>
<http://lattes.cnpq.br/4264357546465157>Research Gate:
https://www.researchgate.net/profile/Rafael_Rios_Moura2
<http://orcid.org/0000-0002-7911-4734>
Rios de Ci?ncia: https://www.youtube.com/channel/UCu2186wIJKji22ai8tvlUfg
<http://orcid.org/0000-0002-7911-4734>


Em sex., 19 de jun. de 2020 ?s 05:40, Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:

  
  
#
Dear Rafael

It is hard to answer here because we do not know what scientific problem 
the referee thinks he or she has spotted which would be solved by such a 
test. Being of a cynical world view I suspect neither does the referee 
and this is a conditioned reflex like Pavlov's dog salivating at the bell.

Are the two moderators of scientific interest to you or are you 
including them so you can say that there is still residual heterogeneity 
even after you did your best to explain it? In the latter case I would 
suggest collinearity is irrelevant.

Michael
On 19/06/2020 13:36, Rafael Rios wrote:

  
    
  
#
Dear Michael,

Thank you for the reply. I am evaluating the biases arising from pooling
samples from different populations and periods on the average effect size.
Therefore, I included both pooling practices, and their interaction as
moderators. Each practice has two levels (yes and no).

Best wishes,

Rafael.
_______________________________________________________

*Prof. Dr. Rafael Rios Moura*

*sciencia amabilis *
Coordenador de Pesquisa e do NEPEE/CNPq
Laborat?rio de Ecologia e Zoologia (LEZ)
UEMG - Unidade Ituiutaba

ORCID: http://orcid.org/0000-0002-7911-4734
Curr?culo Lattes: http://lattes.cnpq.br/4264357546465157
<http://orcid.org/0000-0002-7911-4734>
<http://lattes.cnpq.br/4264357546465157>
<http://lattes.cnpq.br/4264357546465157>Research Gate:
https://www.researchgate.net/profile/Rafael_Rios_Moura2
<http://orcid.org/0000-0002-7911-4734>
Rios de Ci?ncia: https://www.youtube.com/channel/UCu2186wIJKji22ai8tvlUfg
<http://orcid.org/0000-0002-7911-4734>


Em sex., 19 de jun. de 2020 ?s 09:50, Michael Dewey <lists at dewey.myzen.co.uk>
escreveu:

  
  
#
I am not sure I fully understand. Are you saying that the two moderators have two levels each?

Best,
Wolfgang
#
Dear Wolfgang,

Yes, it is. Yes and no for each moderator. I am also evaluating their
interaction.

All the best,

Rafael.
_______________________________________________________

*Prof. Dr. Rafael Rios Moura*

*sciencia amabilis *
Coordenador de Pesquisa e do NEPEE/CNPq
Laborat?rio de Ecologia e Zoologia (LEZ)
UEMG - Unidade Ituiutaba

ORCID: http://orcid.org/0000-0002-7911-4734
Curr?culo Lattes: http://lattes.cnpq.br/4264357546465157
<http://orcid.org/0000-0002-7911-4734>
<http://lattes.cnpq.br/4264357546465157>
<http://lattes.cnpq.br/4264357546465157>Research Gate:
https://www.researchgate.net/profile/Rafael_Rios_Moura2
<http://orcid.org/0000-0002-7911-4734>
Rios de Ci?ncia: https://www.youtube.com/channel/UCu2186wIJKji22ai8tvlUfg
<http://orcid.org/0000-0002-7911-4734>


Em sex., 19 de jun. de 2020 ?s 10:12, Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:

  
  
#
In that case, you can just use vif(). The 'generalized VIF' is only relevant when a factor variable has more than two levels and one wants to compute a VIF that pertains to the whole factor, not just each of the individual dummy variable. But if the factor only has two levels, then there is only one dummy variable, so this is the same as GVIF.

Best,
Wolfgang
#
Thank you very much, Wolfgang. Do you have a reference supporting this
approach? It will be very helpful.

Best wishes,

Rafael.
_______________________________________________________

*Prof. Dr. Rafael Rios Moura*

*scientia amabilis *
Coordenador de Pesquisa e do NEPEE/CNPq
Laborat?rio de Ecologia e Zoologia (LEZ)
UEMG - Unidade Ituiutaba

ORCID: http://orcid.org/0000-0002-7911-4734
Curr?culo Lattes: http://lattes.cnpq.br/4264357546465157
<http://orcid.org/0000-0002-7911-4734>
<http://lattes.cnpq.br/4264357546465157>
<http://lattes.cnpq.br/4264357546465157>Research Gate:
https://www.researchgate.net/profile/Rafael_Rios_Moura2
<http://orcid.org/0000-0002-7911-4734>
Rios de Ci?ncia: https://www.youtube.com/channel/UCu2186wIJKji22ai8tvlUfg
<http://orcid.org/0000-0002-7911-4734>


Em sex., 19 de jun. de 2020 ?s 12:03, Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:

  
  
#
I don't have a reference, but one doesn't need one for this anyway. A factor with two levels is just a dummy variable. That is computationally indistinguishable from a "continuous" predictor that just happens to take on the values 0 and 1. So, the VIFs will be the same whether we regard this as a factor or as a continuous variable. We can also just examine this by example:

library(metafor)

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)

dat$random <- ifelse(dat$alloc == "random", 1, 0)
dat$far    <- ifelse(dat$ablat >= 35, 1, 0)

res <- rma(yi, vi, mods = ~ random*far, data=dat)
res
vif(res)

res <- rma(yi, vi, mods = ~ factor(random)*factor(far), data=dat)
res
vif(res)

Now the more interesting aspect here is that we don't actually have to use 0/1 coding for the factor. We could, for example, also use +-1 coding. This won't change the significance of the interaction term, although it does change the meaning of the "main effects":

dat$random <- ifelse(dat$alloc == "random", 1, -1)
dat$far    <- ifelse(dat$ablat >= 35, 1, -1)
res <- rma(yi, vi, mods = ~ random*far, data=dat)
res

However, this coding can reduce the VIFs quite a bit:

vif(res)

because the correlation between the variables is much lower now:

with(dat, cor(cbind(random, far, random*far)))

But this also shows that the usefulness of VIFs is questionnable, especially for interaction terms. Again, the significance of the interaction term is the same and it would be regardless of how high the VIF is even with 0/1 coding.

Best,
Wolfgang