Collinearity diagnostics for (mixed) multinomial models

Fri, Feb 25, 2022 5:49 AM

I am indeed talking about collinearity of the predictors, not the response.
A multinomial model consists of C-1 binary submodels, so it arguably
doesn't make sense to measure collinearity in the entire dataset at once
but, rather, it should be measured separately in the C-1 subdatasets to
which the C-1 submodels are fit. My question is whether the way I propose
to do this (in the original post) is sensible.

Best,

Juho

pe 25. helmik. 2022 klo 15.19 Sorkin, John (jsorkin at som.umaryland.edu)
kirjoitti:

I would agree with Steven. Collinearity is problem with the predictor
variables, not the outcome variable. Given a multinomial model y = f(x1,
x2, x3, . . . xn), one could run a simple linear regression x1 = f(x2,x3, .
. .,xn) and look at vif to determine if x2 . . . xn are colinear and
perhaps an additional regression x2=f(x1,x3, . . .xn) to determine if x1,
x3, . . . xn are colinear. If I am missing something, I hope someone will
correct me.

John (but not John Fox)



Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
Windows



*From: *stevedrd--- via R-sig-mixed-models
<r-sig-mixed-models at r-project.org>
*Sent: *Friday, February 25, 2022 8:07 AM
*To: *John Fox <jfox at mcmaster.ca>; Juho Kristian Ruohonen
<juho.kristian.ruohonen at gmail.com>
*Cc: *r-sig-mixed-models at r-project.org
*Subject: *Re: [R-sig-ME] Collinearity diagnostics for (mixed)
multinomial models



This seems odd to me, but then I don't usually analyze multinomial
models.  Is there an issue with collinearity in the response variable in a
multinomial model?  I would think that the levels are collinear by
definition.  So then the issue, it seems to me, is whether there is
collinearity in the fixed effects - and that should be independent of the
response variables.  Could you use the vif() function with a standard
response (say = 1) to check collinearity in the fixed effects?  I would
think that your method on the sub datasets may not capture all of the
collinearity in the full model.
But I could be waaaaaaay off base on this.
SteveDenham
    On Friday, February 25, 2022, 03:24:15 AM EST, Juho Kristian Ruohonen <
juho.kristian.ruohonen at gmail.com> wrote:

 Dear John (and anyone else qualified to comment),

I fit lots of mixed-effects multinomial models in my research, and I would
like to see some (multi)collinearity diagnostics on the fixed effects, of
which there are over 30. My models are fit using the Bayesian *brms*
package because I know of no frequentist packages with multinomial GLMM
compatibility.

With continuous or dichotomous outcomes, my go-to function for calculating
multicollinearity diagnostics is of course *vif()* from the *car* package.
As expected, however, this function does not report sensible diagnostics
for multinomial models -- not even for standard ones fit by the *nnet*
package's *multinom()* function. The reason, I presume, is because a
multinomial model is not really one but C-1 regression models  (where C is
the number of response categories) and the *vif()* function is not designed
to deal with this scenario.

Therefore, in order to obtain meaningful collinearity metrics, my present
plan is to write a simple helper function that uses *vif() *to calculate
and present (generalized) variance inflation metrics for the C-1
sub-datasets to which the C-1 component binomial models of the overall
multinomial model are fit. In other words, it will partition the data into
those C-1 subsets, and then apply *vif()* to as many linear regressions
using a made-up continuous response and the fixed effects of interest.

Does this seem like a sensible approach?

Best,

Juho




ma 27. syysk. 2021 klo 19.26 John Fox (jfox at mcmaster.ca) kirjoitti:

Dear Simon,

I believe that Russ's point is that the fact that the additive model
allows you to estimate nonsensical quantities like a mean for girls in
all-boys' schools implies a problem with the model. Why not do as I
suggested and define two dichotomous factors: sex of student
(male/female) and type of school (coed, same-sex)? The four combinations
of levels then make sense.

Best,
 John

On 2021-09-27 12:09 p.m., Simon Harmel wrote:

Thanks, Russ! There is one thing that I still don't understand. We
have two completely empty cells (boys in girl-only & girls in boy-only
schools). Then, how are the means of those empty cells computed (what
data is used in their place in the additive model)?

Let's' simplify the model for clarity:

library(R2MLwiN)
library(emmeans)

Form3 <- normexam ~ schgend + sex ## + standlrt + (standlrt | school)
model3 <- lm(Form3, data = tutorial)

emmeans(model3, pairwise~sex+schgend)$emmeans

 sex  schgend  emmean    SE  df lower.CL upper.CL
 boy  mixedsch -0.2160 0.0297 4055  -0.2742 -0.15780
 girl mixedsch  0.0248 0.0304 4055  -0.0348  0.08437
 boy  boysch    0.0234 0.0437 4055  -0.0623  0.10897
 girl boysch    0.2641 0.0609 4055  0.1447  0.38360<-how computed?
 boy  girlsch  -0.0948 0.0502 4055  -0.1931  0.00358<-how computed?
 girl girlsch  0.1460 0.0267 4055  0.0938  0.19829





On Sun, Sep 26, 2021 at 8:22 PM Lenth, Russell V
<russell-lenth at uiowa.edu> wrote:

By the way, returning to the topic of interpreting coefficients, you

ought to have fun with the ones from the model I just fitted:

Fixed effects:
               Estimate Std. Error t value
(Intercept)    -0.18882    0.05135  -3.677
standlrt        0.55442    0.01994  27.807
schgendboysch  0.17986    0.09915  1.814
schgendgirlsch  0.17482    0.07877  2.219
sexgirl        0.16826    0.03382  4.975

One curious thing you'll notice is that there are no coefficients for

the interaction terms. Why? Because those terms were "thrown out" of the
model, and so they are not shown. I think it is unwise to not show what

was

thrown out (e.g., lm would have shown them as NAs), because in fact what

we

see is but one of infinitely many possible solutions to the regression
equations. This is the solution where the last two coefficients are
constrained to zero. There is another equally reasonable one where the
coefficients for schgendboysch and schgendgirlsch  are constrained to

zero,

and the two interaction effects would then be non-zero. And infinitely

more

where all 7 coefficients are non-zero, and there are two linear

constraints

among them.

Of course, since the particular estimate shown consists of all the

main

effects and interactions are constrained to zero, it does demonstrate

that

the additive model *could* have been used to obtain the same estimates

and

standard errors, and you can see that by comparing the results (and
ignoring the invalid ones from the additive model). But it is just a

lucky

coincidence that it worked out this way, and the additive model did lead

us

down a primrose path containing silly results among the correct ones.

Russ

-----Original Message-----
From: Lenth, Russell V
Sent: Sunday, September 26, 2021 7:43 PM
To: Simon Harmel <sim.harmel at gmail.com>
Cc: r-sig-mixed-models at r-project.org
Subject: RE: [External] Re: [R-sig-ME] Help with interpreting one

fixed-effect coefficient

I guess correctness is in the eyes of the beholder. But I think this

illustrates the folly of the additive model. Having additive effects
suggests a belief that you can vary one factor more or less independently
of the other. In his comments, John Fox makes a good point that escaped

my

earlier cursory view of the original question, that you don't have data

on

girls attending all-boys' schools, nor boys attending all-girls' schools;
yet the model that was fitted estimates a mean response for both those
situations. That's a pretty clear testament to the failure of that model

and also why the coefficients don't make sense. And finally why we have
estimates of 15 comparisons (some of which are aliased with one another),
when only 6 of them make sense.

If instead, a model with interaction were fitted, it would be a

rank-deficient model because two cells are empty. Perhaps there is some
sort of nesting structure that could be used to work around that.

However,

it doesn't matter much because emmeans assesses estimability, and the two
combinations I mentioned above would be flagged as non-estimable. One

could

then more judiciously use the contrast function to test meaningful
contrasts across this irregular array of cell means. Or even

injudiciously

asking for all pairwise comparisons, you will see 6 estimable ones and 9
non-estimable ones. See output below.

Russ

----- Interactive model -----

Form <- normexam ~ 1 + standlrt + schgend * sex + (standlrt | school)
model <- lmer(Form, data = tutorial, REML = FALSE)

fixed-effect model matrix is rank deficient so dropping 2 columns /

coefficients

emmeans(model, pairwise~schgend+sex)

... messages deleted ...

$emmeans
 schgend  sex    emmean    SE  df asymp.LCL asymp.UCL
 mixedsch boy  -0.18781 0.0514 Inf  -0.2885  -0.0871
 boysch  boy  -0.00795 0.0880 Inf  -0.1805    0.1646
 girlsch  boy    nonEst    NA  NA        NA        NA
 mixedsch girl -0.01955 0.0521 Inf  -0.1216    0.0825
 boysch  girl  nonEst    NA  NA        NA        NA
 girlsch  girl  0.15527 0.0632 Inf    0.0313    0.2792

Degrees-of-freedom method: asymptotic
Confidence level used: 0.95

$contrasts
 contrast                    estimate    SE  df z.ratio p.value
 mixedsch boy - boysch boy    -0.1799 0.0991 Inf  -1.814  0.4565
 mixedsch boy - girlsch boy    nonEst    NA  NA      NA      NA
 mixedsch boy - mixedsch girl  -0.1683 0.0338 Inf  -4.975  <.0001
 mixedsch boy - boysch girl    nonEst    NA  NA      NA      NA
 mixedsch boy - girlsch girl  -0.3431 0.0780 Inf  -4.396  0.0002
 boysch boy - girlsch boy      nonEst    NA  NA      NA      NA
 boysch boy - mixedsch girl    0.0116 0.0997 Inf  0.116  1.0000
 boysch boy - boysch girl      nonEst    NA  NA      NA      NA
 boysch boy - girlsch girl    -0.1632 0.1058 Inf  -1.543  0.6361
 girlsch boy - mixedsch girl    nonEst    NA  NA      NA      NA
 girlsch boy - boysch girl      nonEst    NA  NA      NA      NA
 girlsch boy - girlsch girl    nonEst    NA  NA      NA      NA
 mixedsch girl - boysch girl    nonEst    NA  NA      NA      NA
 mixedsch girl - girlsch girl  -0.1748 0.0788 Inf  -2.219  0.2287
 boysch girl - girlsch girl    nonEst    NA  NA      NA      NA

Degrees-of-freedom method: asymptotic
P value adjustment: tukey method for comparing a family of 6 estimates


---------------------------------------------------------
From: Simon Harmel <sim.harmel at gmail.com>
Sent: Sunday, September 26, 2021 3:08 PM
To: Lenth, Russell V <russell-lenth at uiowa.edu>
Cc: r-sig-mixed-models at r-project.org
Subject: [External] Re: [R-sig-ME] Help with interpreting one

fixed-effect coefficient

Dear Russ and the List Members,

If we use Russ' great package (emmeans), we see that although

meaningless, but "schgendgirl-only" can be interpreted using the logic I
mentioned here:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fpipermail%2Fr-sig-mixed-models%2F2021q3%2F029723.html&amp;data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&amp;sdata=kUR%2BudOSdu9gHZCsdimDJGEuheQLyI5pBlwqNctQu4A%3D&amp;reserved=0
.

That is, "schgendgirl-only" can meaninglessly mean: ***diff. bet. boys

in girl-only vs. mixed schools*** just like it can meaningfully mean:
***diff. bet. girls in girl-only vs. mixed schools***

Russ, have I used emmeans correctly?

Simon

Here is a reproducible code:

library(R2MLwiN) # For the dataset
library(lme4)
library(emmeans)

data("tutorial")

Form <- normexam ~ 1 + standlrt + schgend + sex + (standlrt | school)
model <- lmer(Form, data = tutorial, REML = FALSE)

emmeans(model, pairwise~schgend+sex)$contrast

contrast                    estimate    SE  df z.ratio p.value
mixedsch boy - boysch boy    -0.17986 0.0991 Inf -1.814  0.4565
mixedsch boy - girlsch boy  -0.17482 0.0788 Inf -2.219  0.2287

 <--This coef. equals

mixedsch boy - mixedsch girl -0.16826 0.0338 Inf -4.975  <.0001
mixedsch boy - boysch girl  -0.34813 0.1096 Inf -3.178  0.0186
mixedsch boy - girlsch girl  -0.34308 0.0780 Inf -4.396  0.0002
boysch boy - girlsch boy      0.00505 0.1110 Inf  0.045  1.0000
boysch boy - mixedsch girl    0.01160 0.0997 Inf  0.116  1.0000
boysch boy - boysch girl    -0.16826 0.0338 Inf -4.975  <.0001
boysch boy - girlsch girl    -0.16322 0.1058 Inf -1.543  0.6361
girlsch boy - mixedsch girl  0.00656 0.0928 Inf  0.071  1.0000
girlsch boy - boysch girl    -0.17331 0.1255 Inf -1.381  0.7388
girlsch boy - girlsch girl  -0.16826 0.0338 Inf -4.975  <.0001
mixedsch girl - boysch girl  -0.17986 0.0991 Inf -1.814  0.4565
mixedsch girl - girlsch girl -0.17482 0.0788 Inf -2.219  0.2287

 <--This coef.

boysch girl - girlsch girl    0.00505 0.1110 Inf  0.045  1.0000

_______________________________________________
R-sig-mixed-models at r-project.org mailing list

_______________________________________________
R-sig-mixed-models at r-project.org mailing list

_______________________________________________
R-sig-mixed-models at r-project.org mailing list

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&amp;sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&amp;reserved=0

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjsorkin%40som.umaryland.edu%7C5fb7bcf6b8824a3109f708d9f85fa6f1%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637813912584894963%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0&amp;sdata=VXBlGoxZ5iq3OWpGhpxjVbAn9w4OUUTtSp8BARHFQW0%3D&amp;reserved=0

Collinearity diagnostics for (mixed) multinomial models

Thread (5 messages)