Skip to content

nAGQ = 0

14 messages · Jonathan Judge, Poe, John, Rolf Turner +3 more

#
Greetings mixed models gurus.

Some time ago I asked a question of this list concerning problems that I 
was having getting a glmer() model to fit to a data set.  (Using the 
binomial family with the cloglog link.)

It was suggested to me, in the first instance by Tony Ives (thanks 
again, Tony), that I should include the "nAGQ=0" option in my call to 
glmer().  I did so, and it worked like a charm.

I do not however really understand what "nAGQ=0" actually does.  I 
gather (vaguely) that it has something to do with the numerical 
integrals needed when evaluating the log like, and I (even more vaguely) 
gather that with nAGQ=0 this integration is somehow entirely dispensed 
with.  Perhaps I am miss-stating things here.

Be that as it may, it worries me slightly that I (apparently) have to 
use a somewhat less precise method than I otherwise might in order to
get any answer at all.

What risks am I running by setting nAGQ=0?  What perils and pitfalls 
lurk?  Surely there must be a downside to using this (???) short cut.
Although the upside, that I actually get an answer, pretty clearly 
trumps (apologies for the use of that word :-) ) the downside.

I would like some advice, pearls of wisdom, whatever from someone who 
understands what is going on in the underpinnings of fitting mixed models.

Thanks for any wise counsel that you can provide.

cheers,

Rolf Turner
#
Rolf:

I have not studied this extensively with smaller datasets, but with larger datasets --- five-figure and especially six-figure n --- I have found that it often makes no difference. 

When uncertain, I have used a likelihood ratio test to see if the differences are likely to be material. 

My overall suggestion would be that if the dataset is small enough for this choice to matter, it is probably also small enough to solve the model through MCMC, in which case I would recommending using that, because the incorporated uncertainty often gives you better parameter estimates than any increased level of quadrature. 

Best,
Jonathan

Sent from my iPhone
#
On 04/09/17 03:48, Jonathan Judge wrote:
Thanks Jonathan.

(a) How small is "small"?  I have 3 figure n's. I am currently mucking 
about with two data sets.  One has 952 observations (with 22 treatment 
groups, 3 random effect reps per group).  The other has 142 observations 
(with 6 treatment groups and again 3 reps per group).  Would you call 
the latter data set small?

(b) I've never had the courage to try the MCMC approaches to mixed 
models; have just used lme4.  I guess it's time that I bit the bullet.
Psigh.  This is going to take me a while.  As an old dog I *can* learn 
new tricks, but I learn them *slowly*. :-)

(c) In respect of the likelihood ratio test that you suggest --- sorry 
to be a thicko, but I don't get it.  It seems to me that one is fitting 
the *same model* in both instances, so the "degrees of freedom" for such 
a test would be zero.  What am I missing?

Thanks again.

cheers,

Rolf
#
I'm pretty sure that nAGQ=0 is generating conditional modes of group values
for the random effects without subsequently using a laplace approximation.
This is really not something that you want to do unless it's a last resort.
It's kind of like estimating a linear probability model with a random
intercept and using the values for that in the cloglog model. My
understanding is that it's not even an option in most other software.
Someone please correct me if I'm wrong here because what I've found on it
has been kind of vague and I'm making some assumptions.

My guess is that the random intercepts/slopes are going to be too small and
their distributions could be distorted if you're not actually approximating
the integral with something like quadrature or mcmc. That's the case with
PQL and MQL according to simulation evidence at least and even though this
isn't the same as PQL I'd expect a similar problem at first blush.

As to if this is a real practical problem or a theory  problem there's been
kind of a disagreement on that within the stats literature. Some people
argue, in binary outcome models, that biased random intercepts can bias
everything else and others have argued this fear is overblown. This might
well have been settled by actual statisticians by now, I'm not sure.  It's
gotten enough attention in the literature that people certainly worried
about it a lot.

Below are two articles on the topic with simulations but I've seen the
fixed effects results (and LR tests) change based on the random effects
approximation technique in my work so I'm always a bit paranoid about it.
Model misspecification and having oddly shaped random intercepts (as with
count models) can seem to make this problem worse.

You can try using the BRMS package if you aren't comfortable switching to
something totally unfamiliar. It's a wrapper for Stan designed to use lme4
syntax and a lot of good default settings. It's pretty easy to use if you
know lme4 syntax and can read up on mcmc diagnostics.

Liti?re, S., et al. (2008). "The impact of a misspecified random?effects
distribution on the estimation and the performance of inferential
procedures in generalized linear mixed models." Stat Med 27(16): 3125-3144.

McCulloch, C. E. and J. M. Neuhaus (2011). "Misspecifying the shape of a
random effects distribution: why getting it wrong may not matter."
Statistical Science: 388-402.
On Sep 3, 2017 5:49 PM, "Rolf Turner" <r.turner at auckland.ac.nz> wrote:

            

  
  
#
Oh and on the model comparison thing the only way i know how to directly
compare random effects distribution estimates is with gateaux derivatives a
la what Sophia Rabe-Hesketh did in her GLLAMM software for nonparametric
random effects estimation.

Usually i just kind of eyeball it and try to be overly conservative with
quadrature points or use MCMC.

I guess you could rig up some regular deviance test to do the same thing?
On Sep 3, 2017 6:51 PM, "Poe, John" <jdpo223 at g.uky.edu> wrote:

            

  
  
#
Doug Bates has posted some discussion on this as part of his newest book
project on mixed models (this time in Julia):

https://github.com/dmbates/MixedModelsinJulia/blob/master/nAGQ.ipynb

I'll try to summarize. The difference seems to be

- (nAGQ=0, fast=true) the fixed effects (beta) are estimated with the
conditional modes (b) of the random effects as part of the penalized
iterative least squares (PIRLS) step. Only the covariance parameters
(theta) are estimated during the nonlinear optimization step.

or

- (nAGQ=1, fast=slow) the fixed effect estimates are further optimized
with the covariance parameters as part of the nonlinear optimization
step i.e the Laplace approximation to the deviance.

Some software supports nAGQ > 1, in which case we have a better but more
numerically intensive approximation to the optimization; the Laplace
approximation is just a special case of Gaussian quadrature with one
quadrature point. I think lme4/MixedModels.jl supports this only for
single scalar (intercept-only) random effect.

As John pointed out, there doesn't seem to be other software that is
willing to omit the fixed effects from quadrature procedure as
lme4/MixedModels.jl does for nAGQ=0. For lme4/MixedModels.jl, they are
used as the starting values for the Laplace approximation, so it's
'free' to expose them to the user.

(If I recall correctly, you get the conditional modes for "free" in the
LMM case, at least for the numerical procedure used in
lme4/MixedModels.jl. The lme4 paper has more details on this, if you're
up for the math.)

Having additional parameters in the nonlinear optimization step of
course increases the dimensionality of that step and slows it down a
bit. However, estimating the fixed effects with the full covariance
parameters will often yield a better fit and will affect error estimates
and thus t-/z-values. I'm not sure how much it will affect the
coefficient estimates for 'well-behaved' datasets -- my guess is that
computing the fixed effects with the covariance parameters will affect
the relative amount of shrinkage amongst the fixed effects, increasing
it for some, decreasing it for others.

It's interesting to note that the conditional mode estimates are the
same in both cases because they are estimated first.

Best,
Phillip
On 09/04/2017 12:59 AM, Poe, John wrote:
2 days later
#
On 04/09/17 10:51, Poe, John wrote:
<SNIP>
<SNIP>

I remember a quote from some Brit comedy series (it starred the bloke 
who played Terry on "Minder"):  "My mother always said that you should 
try anything once, except for incest and Morris dancing.  She was right 
about the Morris dancing."

On that basis I decided to try using brms.  I installed the package from 
CRAN with no real problems, although there were some slightly worrying 
and highly technical warnings from "rstan" --- I think:
Then I read the vignette brms_overview a bit, and plunged in with trying 
to fit a model.  Needless to say, the attempt didn't get much past 
square zero.  I tried it again with my artificial simulated data that I 
talked about in the post that started off this train of craziness (it 
elicited the suggestion from Tony Ives that I try using nAGQ=0).  The 
result was the same --- square zero + epsilon:
.
     .
     .
Since I'm flying completely blind here (no idea WTF I'm doing) I have 
come to a shuddering halt.

I have attached my function "artSim.R" to generate the artificial data,
and a script to source to effect the call to brm() that I used.

If some kind mixed models guru could take a look and point out to me 
just what bit of egregious stupidity I am committing, I'd be ever so 
humbly grateful.  I don't know from Bayesian stuff (priors, and like 
that) at all, so it's likely to be something pretty stupid and pretty 
simple in the first instance.

cheers,

Rolf Turner
#
This is where my being a political scientist on a listserv full of
definitely not political scientists is going to make me look dumb. I've
never actually seen a model specification like that before for a multilevel
model. Is the outcome supposed to be the proportion dead out of the total
population for each row? I'm missing something about this that is probably
very obvious.

After fiddling with it I was able to get it to converge for one chain but I
wouldn't trust it at all right now.
On Wed, Sep 6, 2017 at 5:48 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:

            

  
    
#
On 07/09/17 10:57, Poe, John wrote:

            
Well, I hadn't seen a model specification quite like that either.  It's 
brm() syntax, which is a bit different from glm() or glmer() syntax.

The model being fitted is a binomial model, with success probability p 
modelled by

     g(p) = beta_0 + beta_1 * x + Z_0 + Z_1 * x

where the Z_k are the random effects, and where g() is the link 
function, e.g. cloglog(). (I *think* I've got that right.)

Of course beta_0 and beta_1 depend on treatment group and Z_0 and Z_1
depend on "Rep", which is nested within treatment group.

Note that we are talking *generalized* linear mixed models here; the 
responses are binomial success counts.  "Success" = Dead, since one is 
trying to kill the bugs.

For what it's worth the glmer() syntax is

    glmer(cbind(Dead,Alive) ~ (0 + Trt)/x + (x | Rep),
          family=binomial(link="cloglog"),data=X)

I got the brm() syntax from the vignette; as I said, I'd never seen it 
before.

cheers,

Rolf
#
Hi Rolf, 

Just a quick point: running the brms model with a logit link works fine for me.

Best regards
Conor Goold
PhD Student
Phone:        +47 67 23 27 24



Norwegian University of Life Sciences
Campus ?s. www.nmbu.no
#
On 07/09/17 20:04, Conor Michael Goold wrote:
That is indeed interesting. I'll investigate.  Thanks for pointing this out.

cheers,

Rolf
#
As mentioned previously, the formula syntax for this one is a bit
confusing. The left-hand side was easy enough to figure out from the
brms docs, but the division operator on the RHS was a trick I had never
used before -- it took me looking at the output from the
glmer(...,nAGQ=0) model to get that it's equivalent to	

~ 0 + Trt + Trt:x + (x | Rep)

Phillip
On 09/07/2017 06:12 AM, Rolf Turner wrote:
#
To further clear things up for other readers, this formula syntax is a nice
and very useful (if little known) trick for estimating separate intercepts
and x slopes for each level of Trt. It creates a design matrix like:

[1 0 x_1 0]
[1 0 x_2 0]
[0 1 0 x_3]
[0 1 0 x_4]

where the first two rows belong to Trt=1, the second two rows belong to
Trt=2; columns 1 and 2 are the Trt1 and Trt2 intercepts, respectively; and
columns 3 and 4 contain only the x values for Trt1 and Trt2, respectively,
so their associated coefficients give the group-specific slopes for each
level of Trt.

Jake
On Thu, Sep 7, 2017 at 2:41 PM, Phillip Alday <phillip.alday at mpi.nl> wrote:

            

  
  
#
...BTW, the syntax in question, for those not interested enough to locate
it in the previously attached R script, is:
 ~ (0+Trt)/x


On Thu, Sep 7, 2017 at 3:00 PM, Jake Westfall <jake.a.westfall at gmail.com>
wrote: