Back to formatted view
Raw Message

Message-ID: <8d693c3eb6e84095ba13a6129349c632@UM-MAIL3214.unimaas.nl>
Date: 2020-06-19T15:52:47Z
From: Wolfgang Viechtbauer
Subject: [R-meta]  Testing multicollinearity between categorical predictors
In-Reply-To: <CAAnf+jSuXpd4_qBrJhB9dvzmQX-TVvezSBLXkKAdEuV+8ULj2Q@mail.gmail.com>

I don't have a reference, but one doesn't need one for this anyway. A factor with two levels is just a dummy variable. That is computationally indistinguishable from a "continuous" predictor that just happens to take on the values 0 and 1. So, the VIFs will be the same whether we regard this as a factor or as a continuous variable. We can also just examine this by example:

library(metafor)

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)

dat$random <- ifelse(dat$alloc == "random", 1, 0)
dat$far    <- ifelse(dat$ablat >= 35, 1, 0)

res <- rma(yi, vi, mods = ~ random*far, data=dat)
res
vif(res)

res <- rma(yi, vi, mods = ~ factor(random)*factor(far), data=dat)
res
vif(res)

Now the more interesting aspect here is that we don't actually have to use 0/1 coding for the factor. We could, for example, also use +-1 coding. This won't change the significance of the interaction term, although it does change the meaning of the "main effects":

dat$random <- ifelse(dat$alloc == "random", 1, -1)
dat$far    <- ifelse(dat$ablat >= 35, 1, -1)
res <- rma(yi, vi, mods = ~ random*far, data=dat)
res

However, this coding can reduce the VIFs quite a bit:

vif(res)

because the correlation between the variables is much lower now:

with(dat, cor(cbind(random, far, random*far)))

But this also shows that the usefulness of VIFs is questionnable, especially for interaction terms. Again, the significance of the interaction term is the same and it would be regardless of how high the VIF is even with 0/1 coding.

Best,
Wolfgang

>-----Original Message-----
>From: Rafael Rios [mailto:biorafaelrm at gmail.com]
>Sent: Friday, 19 June, 2020 17:33
>To: Viechtbauer, Wolfgang (SP)
>Cc: r-sig-meta-analysis at r-project.org
>Subject: Re: [R-meta] Testing multicollinearity between categorical
>predictors
>
>Thank you very much, Wolfgang. Do you have a reference supporting this
>approach? It will be very helpful.
>
>Best wishes,
>
>Rafael.
>
>Em sex., 19 de jun. de 2020 ?s 12:03, Viechtbauer, Wolfgang (SP)
><wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:
>In that case, you can just use vif(). The 'generalized VIF' is only relevant
>when a factor variable has more than two levels and one wants to compute a
>VIF that pertains to the whole factor, not just each of the individual dummy
>variable. But if the factor only has two levels, then there is only one
>dummy variable, so this is the same as GVIF.
>
>Best,
>Wolfgang
>
>>-----Original Message-----
>>From: Rafael Rios [mailto:biorafaelrm at gmail.com]
>>Sent: Friday, 19 June, 2020 16:35
>>To: Viechtbauer, Wolfgang (SP)
>>Cc: r-sig-meta-analysis at r-project.org
>>Subject: Re: [R-meta] Testing multicollinearity between categorical
>>predictors
>>
>>Dear Wolfgang,
>>
>>Yes, it is. Yes and no for each moderator. I am also evaluating their
>>interaction.
>>
>>All the best,
>>
>>Rafael.
>>
>>Em sex., 19 de jun. de 2020 ?s 10:12, Viechtbauer, Wolfgang (SP)
>><wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:
>>I am not sure I fully understand. Are you saying that the two moderators
>>have two levels each?
>>
>>Best,
>>Wolfgang
>>
>>>-----Original Message-----
>>>From: Rafael Rios [mailto:biorafaelrm at gmail.com]
>>>Sent: Friday, 19 June, 2020 15:02
>>>To: Michael Dewey
>>>Cc: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
>>>Subject: Re: [R-meta] Testing multicollinearity between categorical
>>>predictors
>>>
>>>Dear Michael,
>>>
>>>Thank you for the reply. I am evaluating the biases arising from pooling
>>>samples from different populations and periods on the average effect size.
>>>Therefore, I included both pooling practices, and their interaction as
>>>moderators. Each practice has two levels (yes and no).
>>>
>>>Best wishes,
>>>
>>>Rafael.
>>>
>>>Em sex., 19 de jun. de 2020 ?s 09:50, Michael Dewey
>>><lists at dewey.myzen.co.uk> escreveu:
>>>Dear Rafael
>>>
>>>It is hard to answer here because we do not know what scientific problem
>>>the referee thinks he or she has spotted which would be solved by such a
>>>test. Being of a cynical world view I suspect neither does the referee
>>>and this is a conditioned reflex like Pavlov's dog salivating at the bell.
>>>
>>>Are the two moderators of scientific interest to you or are you
>>>including them so you can say that there is still residual heterogeneity
>>>even after you did your best to explain it? In the latter case I would
>>>suggest collinearity is irrelevant.
>>>
>>>Michael
>>>
>>>On 19/06/2020 13:36, Rafael Rios wrote:
>>>> Dear Wolfgang,
>>>>
>>>> Thank you for the replay. I also thought about using VIF to evaluate
>>>> multicollinearity, but there is a lot of criticism about the applicability
>>>> of VIF for categorical predictors. There is a variation called GVIF.
>>>> However, since the meta-analysis changes categorical predictors to dummy
>>>> variables, I could not use it in R. I am not sure whether this is the best
>>>> approach. Do you not other methods to evaluate or avoid potential
>>>> multicollinearity among categorical moderators?
>>>>
>>>> Best wishes,
>>>>
>>>> Rafael.
>>>>
>>>> Em sex., 19 de jun. de 2020 ?s 05:40, Viechtbauer, Wolfgang (SP) <
>>>> wolfgang.viechtbauer at maastrichtuniversity.nl> escreveu:
>>>>
>>>>> Dear Rafael,
>>>>>
>>>>> I don't know what "testing" for multicollinearity would entail. One could
>>>>> examine the variance inflation factors with vif(). What VIF values are
>>>>> considered "large" is debatable though.
>>>>>
>>>>> Best,
>>>>> Wolfgang