Skip to content

glmer conditional deviance DECREASING after removal of fixed effect??

8 messages · Juho Kristian Ruohonen, Douglas Bates, Phillip Alday +1 more

#
Hi,

I thought it was impossible for deviance to decrease when a term is
removed!?!? Yet I'm seeing it happen with this pair of relatively simple
Bernoulli GLMMs fit using lme4:glmer():
*     full  reduced*
*2808.671 2807.374 *

What on earth going on? FYI, I am deliberately comparing conditional
deviances rather than marginal ones, because quite a few of the clusters
are of inherent interest and likely to recur in future data.

My anonymized datafile is attached.

Best,

Juho

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: anon.txt
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20230831/bca7eb87/attachment-0001.txt>
#
You say you are comparing conditional deviances rather than marginal
deviances.  Can you expand on that a bit further?  How are you evaluating
the conditional deviances?

The models are being fit according to what you describe as the marginal
deviance - what I would call the deviance.  It is not surprising that what
you are calling the conditional deviance is inconsistent with the nesting
of the models because they weren't fit according to that criterion.

julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
           fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
       end
Minimizing 141   Time: 0:00:00 ( 1.22 ms/it)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
  y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

   logLik    deviance     AIC       AICc        BIC
 -1450.8163  2900.8511  2915.6326  2915.6833  2955.5600

Variance components:
      Column   Variance Std.Dev.
id (Intercept)  0.270823 0.520406

 Number of obs: 2217; levels of grouping factors: 331

Fixed-effects parameters:
????????????????????????????????????????????????????
                  Coef.  Std. Error      z  Pr(>|z|)
????????????????????????????????????????????????????
(Intercept)   0.0969256   0.140211    0.69    0.4894
x1: 1        -0.0449824   0.0553361  -0.81    0.4163
x2            0.0744891   0.0133743   5.57    <1e-07
x3           -0.548392    0.0914109  -6.00    <1e-08
x4: B         0.390359    0.0803063   4.86    <1e-05
x4: C         0.299932    0.0991249   3.03    0.0025
????????????????????????????????????????????????????

julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
           fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
       end
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
  y ~ 1 + x1 + x2 + x3 + (1 | id)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

   logLik    deviance     AIC       AICc        BIC
 -1472.4551  2944.0654  2954.9102  2954.9373  2983.4297

Variance components:
      Column   Variance Std.Dev.
id (Intercept)  0.274331 0.523766

 Number of obs: 2217; levels of grouping factors: 331

Fixed-effects parameters:
????????????????????????????????????????????????????
                  Coef.  Std. Error      z  Pr(>|z|)
????????????????????????????????????????????????????
(Intercept)  -0.448496    0.100915   -4.44    <1e-05
x1: 1        -0.0527684   0.0547746  -0.96    0.3354
x2            0.0694393   0.0131541   5.28    <1e-06
x3           -0.556903    0.0904798  -6.15    <1e-09
????????????????????????????????????????????????????

julia> MixedModels.likelihoodratiotest(m02, m01)
Model Formulae
1: y ~ 1 + x1 + x2 + x3 + (1 | id)
2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
??????????????????????????????????????????????????
     model-dof   deviance       ??  ??-dof  P(>??)
??????????????????????????????????????????????????
[1]          5  2944.0654
[2]          7  2900.8511  43.2144       2  <1e-09
??????????????????????????????????????????????????

I would note that your data are so imbalanced with respect to id that it is
not surprising that you get unstable results.  (I changed your id column
from integers to a factor so that 33 becomes S033.)

331?2 DataFrame
 Row ? id      nrow
     ? String  Int64
?????????????????????
   1 ? S033     1227
   2 ? S134       46
   3 ? S295       45
   4 ? S127       41
   5 ? S125       33
   6 ? S228       31
   7 ? S193       23
   8 ? S064       18
   9 ? S281       16
  10 ? S055       13
  11 ? S035       13
  12 ? S091       12
  13 ? S175       11
  14 ? S284       10
  15 ? S159       10
  ?  ?   ?       ?
 317 ? S324        1
 318 ? S115        1
 319 ? S192        1
 320 ? S201        1
 321 ? S156        1
 322 ? S202        1
 323 ? S067        1
 324 ? S264        1
 325 ? S023        1
 326 ? S090        1
 327 ? S195        1
 328 ? S170        1
 329 ? S241        1
 330 ? S189        1
 331 ? S213        1
     301 rows omitted



On Thu, Aug 31, 2023 at 3:02?AM Juho Kristian Ruohonen <
juho.kristian.ruohonen at gmail.com> wrote:

            

  
  
#
My (possibly mistaken) understanding of the conditional deviance is that it
is the deviance that results from comparing the observed responses to
fitted values calculated with the BLUPs factored in. Since the cluster
effects are of inherent interest and bound to recur, this is what I want.

It is not surprising that what you are calling the conditional deviance is
Wow. I think that's the answer, simple as that. Now I'm kinda embarrassed
for having even had to ask.

Many thanks!

Best,

Juho

to 31. elok. 2023 klo 15.56 Douglas Bates (dmbates at gmail.com) kirjoitti:

  
  
#
I may have attributed the confusion about conditional and marginal
deviances to you, Juho, when in fact it is the fault of the authors of
lme4, of whom I am one.  If so, I apologize.

We (the authors of lme4) had a discussion about the definition of deviance
many years ago and I am not sure what the upshot was.  It may have been
that we went with this incorrect definition of the "conditional deviance".
Unfortunately my R skills are sufficiently atrophied that I haven't been
able to look up what is actually evaluated.

The area of generalized linear models and generalized linear mixed models
holds many traps for the unwary in terms of definitions, and some casual
definitions in the original glm function, going as far back as the S3
language, aid in the confusion.  For example, the dev.resid function in a
GLM family doesn't return the deviance residuals - it returns the square of
the deviance residuals, which should be named the "unit deviances".

Given a response vector and the predicted mean vector and a distribution
family one can define the unit deviances, which play a role similar to the
squared residual in the Gaussian family.  The sum of these unit deviances
is the deviance of a generalized linear model, but it is not the deviance
of a generalized linear mixed model, because the likelihood for a GLMM
involves both a "fidelity to the data" term *and* a "complexity of the
model" term, which is the size of the random effects vector in a certain
metric.

So if indeed we used an inappropriate definition of deviance the mistake is
ours and we should correct it.
On Thu, Aug 31, 2023 at 7:56?AM Douglas Bates <dmbates at gmail.com> wrote:

            

  
  
#
I have written about how I now understand the evaluation of the likelihood
of a GLMM in an appendix of the in-progress book at
https://juliamixedmodels.github.io/EmbraceUncertainty

It is a difficult area to make sense of.  Even now, after more than 25
years of thinking about these models, I am still not sure exactly how to
define the objective in a GLMM for a family with a dispersion parameter.
On Thu, Aug 31, 2023 at 8:37?AM Douglas Bates <dmbates at gmail.com> wrote:

            

  
  
#
I think this highlights the intersection of two big stumbling blocks:

1. a lot of identities / niceness in classical GLM theory doesn't hold 
in the mixed models case. See also coefficient of determination, clear 
notions of denominator degrees of freedom for F tests, etc.

2. a lot of R and S functionality around GLMs is based on convenient 
computational quantities in classical GLM theory. This leads to some 
infelicity in naming, which is further confused by (1) for mixed models.

I think this highlights a real challenge to many users, also because of 
the way we teach statistics. (I'm teaching a course soon, can everyone 
tell?) Many intro statistics courses state things like "the coefficient 
of determination, R^2, is the squared Pearson correlation, r, and can be 
interpreted as ....". But we don't highlight that that's an identity for 
simple linear regression and not the formal definition of the 
coefficient of determination. We also don't emphasize how many of these 
definitions hinge on concepts that don't have an obvious extension to 
multi-level models or perhaps none of the obvious extensions hold all 
the properties of the classical GLM form. (Most of this is discussed in 
the GLMM FAQ!)

This is a criticism of myself as an instructor as much as anything, but 
it's something I like to emphasize in teaching nowadays and highlight as 
a point 're-education' for students who've had a more traditional education.

Phillip
On 8/31/23 08:45, Douglas Bates wrote:
#
For what it's worth we did attempt to document what we knew about 
deviance here:

https://github.com/lme4/lme4/blob/master/man/merMod-class.Rd
(formatted version at  https://rdrr.io/cran/lme4/man/merMod-class.html)

in particular the note that says

\item If adaptive Gauss-Hermite quadrature is used, then
     \code{logLik(object)} is currently only proportional to the
     absolute-unconditional log-likelihood.

   To make things worth, this is about "absolute" vs "relative" and 
"conditional" vs "unconditional", "marginal" doesn't even come into it ...

   I will take a look at the example myself if I get a chance, as I find 
the result surprising too ...

   cheers
     Ben Bolker
On 2023-08-31 10:08 a.m., Phillip Alday wrote:
#
...and here I was thinking "unconditional likelihood" was just another
synonym for the "marginal" or "average" likelihood I see discussed in stats
books...

Oh well, Douglas has succinctly solved my main problem. Any further
insights are just icing on the cake.

Thanks again to all,

J

to 31. elok. 2023 klo 20.43 Ben Bolker (bbolker at gmail.com) kirjoitti: