-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear sig-mixed readers,
~ Some of my students and I are foolishly attempting to write a review
of GLMMs for an ecology/evolution audience. Realizing that this is a
huge, gnarly, and not-completely-understood subject (even for the
experts -- see fortune("mixed")), we're trying to provide as much
non-technical background and guidance as can be squeezed into a
reasonable sized journal article ...
~ We've run into quite a few questions we have been unable to answer
... perhaps because no-one knows the answers, or because we're looking
in the wrong places. We thought we might impose on the generosity of
the list: if this feels ridiculous, please just ignore this message.
Feedback ranging from "this is true, but I don't know of a published
source" to "this just isn't true" would be useful. We are aware of the
deeper problems of focusing on p-values and degrees of freedom -- we
will encourage readers to focus on estimating effect sizes and
confidence limits -- but we would also like to answer some of these
questions for them, if we can.
~ Ben Bolker
1. Is there a published justification somewhere for Lynn Eberly's
( http://www.biostat.umn.edu/~lynn/ph7430/class.html ) statement that df
adjustments are largely irrelevant if number of blocks>25 ?
2. What determines the asymptotic performance of the LRT (likelihood
ratio test) for comparison of fixed effects, which is known to be poor
for "small" sample sizes? Is it the number of random-effects levels
(as stated by Agresti 2002, p. 520), or is it the number of levels of
the fixed effect relative to the total number of data points (as
stated by Pinheiro and Bates 2000, pp. 87-89)? (The example given by
PB2000 from Littell et al. 1996 is a test of a treatment factor with
15 levels, in a design with 60 observations and 15 blocks. Agresti's
statement would imply that one would still be in trouble if the total
number of observations increased to 600 [because # blocks is still
small], where PB2000 would imply that the LRT would be OK in this
limit. (A small experiment with the simulate.lme() example given on
PB2000 p. 89 suggests that increasing the sample size 10-fold with the
same number of blocks DOES make the LRT OK ... but I would need to do
this a bit more carefully to be sure.) (Or is this a difference
between the linear and generalized linear case?)
3. For multi-level models (nested, certainly crossed), how would one
count the "number of random-effects levels" to quantify the 'sample
size' above? With a single random effect, we can just count the
number of levels (blocks). What would one do with e.g. a nested or
crossed design? (Perhaps the answer is "don't use a likelihood ratio
test to evaluate the significance of fixed effects".)
4. Does anyone know of any evidence (in either direction) that the
"boundary" problems that apply to the likelihood ratio test (e.g. Self
and Liang 1987) also apply to information criteria comparisons of
models with and without random effects? I would expect so, since the
derivations of the AIC involve Taylor expansions around the
null-hypothesis parameters ...
5. It's common sense that estimating the variance of a random effect
from a small number of levels (e.g. less than 5) should be dicey, and
that one might in this case want to treat the parameter as a fixed
effect (regardless of its philosophical/experimental design status).
For small numbers of levels I would expect (?) that the answers MIGHT
be similar -- among other things the difference between df=1 and
df=(n-1) would be small. But ... is there a good discussion of this
in print somewhere? (Crawley mentions this on p. 670 of "Statistical
Computing", but without justification.)
lme4-specific questions:
6. Behavior of glmer: Does glmer really use AGQ, or just Laplace?
Both? pp. 28-32 of the "Implementation" vignette in lme4 suggest that
a Laplace approximation is used, but I can't figure out whether this
is an additional approximation on top of the AGQ/Laplace approximation
of the integral over the random effects used in "ordinary" LMM. When
I fit a GLMM with the different methods, the fitted objects are not
identical but all the coefficients seem to be. (I have poked at the
code a bit but been unable to answer this question for myself
... sorry ...)
(The glmmML package claims to fit via Laplace or Gauss-Hermite
quadrature (with non-adaptive, but adjustable, number of quad points
- -- so it's at least theoretically possible?)
library(lme4)
set.seed(1001)
f = factor(rep(1:10,each=10))
zb = rnorm(1:10,sd=2) ## block effects
x = runif(100)
eta = 2*x+zb[f]+rnorm(100)
y = rpois(100,exp(eta))
g1 = glmer(y~x+(1|f),family="poisson",method="Laplace")
g2 = glmer(y~x+(1|f),family="poisson",method="AGQ")
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIIc1Lc5UpGjwzenMRAr4uAJ90myt79pJZCa1a801FkxHRnAHYdgCfUYy+
P0ljXHs4lt8aTwpWKncRkBg=
=nd22
-----END PGP SIGNATURE-----
general GLMM questions
6 messages · Ben Bolker, Dimitris Rizopoulos, vito muggeo +2 more
----- Original Message ----- From: "Ben Bolker" <bolker at ufl.edu> To: "R Mixed Models" <r-sig-mixed-models at r-project.org> Sent: Wednesday, May 07, 2008 5:39 PM Subject: [R-sig-ME] general GLMM questions
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear sig-mixed readers,
~ Some of my students and I are foolishly attempting to write a
review
of GLMMs for an ecology/evolution audience. Realizing that this is
a
huge, gnarly, and not-completely-understood subject (even for the
experts -- see fortune("mixed")), we're trying to provide as much
non-technical background and guidance as can be squeezed into a
reasonable sized journal article ...
~ We've run into quite a few questions we have been unable to
answer
... perhaps because no-one knows the answers, or because we're
looking
in the wrong places. We thought we might impose on the generosity
of
the list: if this feels ridiculous, please just ignore this message.
Feedback ranging from "this is true, but I don't know of a published
source" to "this just isn't true" would be useful. We are aware of
the
deeper problems of focusing on p-values and degrees of freedom -- we
will encourage readers to focus on estimating effect sizes and
confidence limits -- but we would also like to answer some of these
questions for them, if we can.
~ Ben Bolker
1. Is there a published justification somewhere for Lynn Eberly's
( http://www.biostat.umn.edu/~lynn/ph7430/class.html ) statement
that df
adjustments are largely irrelevant if number of blocks>25 ?
2. What determines the asymptotic performance of the LRT (likelihood
ratio test) for comparison of fixed effects, which is known to be
poor
for "small" sample sizes? Is it the number of random-effects levels
(as stated by Agresti 2002, p. 520), or is it the number of levels
of
the fixed effect relative to the total number of data points (as
stated by Pinheiro and Bates 2000, pp. 87-89)? (The example given
by
PB2000 from Littell et al. 1996 is a test of a treatment factor with
15 levels, in a design with 60 observations and 15 blocks.
Agresti's
statement would imply that one would still be in trouble if the
total
number of observations increased to 600 [because # blocks is still
small], where PB2000 would imply that the LRT would be OK in this
limit. (A small experiment with the simulate.lme() example given on
PB2000 p. 89 suggests that increasing the sample size 10-fold with
the
same number of blocks DOES make the LRT OK ... but I would need to
do
this a bit more carefully to be sure.) (Or is this a difference
between the linear and generalized linear case?)
Intuitively, I think this will depend on the correlation between the measurements within each block. At the one extreme, assume that all the measurements within each block are the same, then actual sample size would be the number of block. At the other extreme, assume that all the measurements within each block are random, then the sample size would be the total number of observations. I know some people from Hasselt University in Belgium that have worked on the "Effective Sample Size" for mixed models; you can check at the following presentation given in ISCB last summer (http://www.iscb2007.gr/ppt/Wednesday/Orfeas/16.54-17.12/slides_ISCB2007.pdf)
3. For multi-level models (nested, certainly crossed), how would one count the "number of random-effects levels" to quantify the 'sample size' above? With a single random effect, we can just count the number of levels (blocks). What would one do with e.g. a nested or crossed design? (Perhaps the answer is "don't use a likelihood ratio test to evaluate the significance of fixed effects".) 4. Does anyone know of any evidence (in either direction) that the "boundary" problems that apply to the likelihood ratio test (e.g. Self and Liang 1987) also apply to information criteria comparisons of models with and without random effects? I would expect so, since the derivations of the AIC involve Taylor expansions around the null-hypothesis parameters ...
I think this is an interesting question. Let # AIC under the null AIC.0 = -2*logLik.0 + 2npar # AIC under the alternative AIC.1 = -2*logLik.1 + 2(npar + 1) then you reject when AIC.1 < AIC.0 => -2*logLik.1 + 2(npar + 1) + -2*logLik.0 + 2npar < 0 => LRT > 2. Now for boundary problems and according to Stram and Lee (1994, Biometrics), LRT ~ 0.5 * chisq(0) + 0.5 * chisq(1), for which the critical value is 1.923. Thus, it seems to work more or less ok in this case. However, if you wanted to test wether the variance is 10, then LRT ~ chisq(1), for which the critical value is 3.841!
5. It's common sense that estimating the variance of a random effect from a small number of levels (e.g. less than 5) should be dicey, and that one might in this case want to treat the parameter as a fixed effect (regardless of its philosophical/experimental design status). For small numbers of levels I would expect (?) that the answers MIGHT be similar -- among other things the difference between df=1 and df=(n-1) would be small. But ... is there a good discussion of this in print somewhere? (Crawley mentions this on p. 670 of "Statistical Computing", but without justification.) lme4-specific questions: 6. Behavior of glmer: Does glmer really use AGQ, or just Laplace? Both? pp. 28-32 of the "Implementation" vignette in lme4 suggest that a Laplace approximation is used, but I can't figure out whether this is an additional approximation on top of the AGQ/Laplace approximation of the integral over the random effects used in "ordinary" LMM. When I fit a GLMM with the different methods, the fitted objects are not identical but all the coefficients seem to be. (I have poked at the code a bit but been unable to answer this question for myself ... sorry ...)
Well, in Linear Mixed Models the integral over the random effects can be analytically evaluated and thus no approximation (i.e., AGQ or Laplace) is required. In GLMMs this is not the case and thus the log-likelihood needs to be calculated approximately. One method for approximating the integral is the AGQ, and in fact Laplace is AGQ with one quadrature point. AFAIK (but Doug can correct if I'm wrong), glmer() uses Laplace since AGQ is not yet implemented.
(The glmmML package claims to fit via Laplace or Gauss-Hermite quadrature (with non-adaptive, but adjustable, number of quad points - -- so it's at least theoretically possible?) library(lme4) set.seed(1001) f = factor(rep(1:10,each=10)) zb = rnorm(1:10,sd=2) ## block effects x = runif(100) eta = 2*x+zb[f]+rnorm(100) y = rpois(100,exp(eta)) g1 = glmer(y~x+(1|f),family="poisson",method="Laplace") g2 = glmer(y~x+(1|f),family="poisson",method="AGQ") -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIIc1Lc5UpGjwzenMRAr4uAJ90myt79pJZCa1a801FkxHRnAHYdgCfUYy+ P0ljXHs4lt8aTwpWKncRkBg= =nd22 -----END PGP SIGNATURE-----
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Dear Ben, I am going to reply just to your question #4. Yes, the AIC suffers from the same drawbacks of the log-Likelihood; therefore if the LRT does not work (for testing for variance components in LMM and also for any non-regular models, say), the AIC is not expected to work too. (I don't remember the references where I read this,..sorry). For instance, in problems related to breakpoint estimation the logLik is just piecewise differentiable and if one is interested in testing for the existence of the breakpoint, the LRT and the AIC do not work. However simulation studies have shown that the BIC works (this makes sense because the BIC has a Bayesian justification and nondifferentiable logLik typically does not matter..) best, vito Ben Bolker ha scritto:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear sig-mixed readers,
~ Some of my students and I are foolishly attempting to write a review
of GLMMs for an ecology/evolution audience. Realizing that this is a
huge, gnarly, and not-completely-understood subject (even for the
experts -- see fortune("mixed")), we're trying to provide as much
non-technical background and guidance as can be squeezed into a
reasonable sized journal article ...
~ We've run into quite a few questions we have been unable to answer
... perhaps because no-one knows the answers, or because we're looking
in the wrong places. We thought we might impose on the generosity of
the list: if this feels ridiculous, please just ignore this message.
Feedback ranging from "this is true, but I don't know of a published
source" to "this just isn't true" would be useful. We are aware of the
deeper problems of focusing on p-values and degrees of freedom -- we
will encourage readers to focus on estimating effect sizes and
confidence limits -- but we would also like to answer some of these
questions for them, if we can.
~ Ben Bolker
1. Is there a published justification somewhere for Lynn Eberly's
( http://www.biostat.umn.edu/~lynn/ph7430/class.html ) statement that df
adjustments are largely irrelevant if number of blocks>25 ?
2. What determines the asymptotic performance of the LRT (likelihood
ratio test) for comparison of fixed effects, which is known to be poor
for "small" sample sizes? Is it the number of random-effects levels
(as stated by Agresti 2002, p. 520), or is it the number of levels of
the fixed effect relative to the total number of data points (as
stated by Pinheiro and Bates 2000, pp. 87-89)? (The example given by
PB2000 from Littell et al. 1996 is a test of a treatment factor with
15 levels, in a design with 60 observations and 15 blocks. Agresti's
statement would imply that one would still be in trouble if the total
number of observations increased to 600 [because # blocks is still
small], where PB2000 would imply that the LRT would be OK in this
limit. (A small experiment with the simulate.lme() example given on
PB2000 p. 89 suggests that increasing the sample size 10-fold with the
same number of blocks DOES make the LRT OK ... but I would need to do
this a bit more carefully to be sure.) (Or is this a difference
between the linear and generalized linear case?)
3. For multi-level models (nested, certainly crossed), how would one
count the "number of random-effects levels" to quantify the 'sample
size' above? With a single random effect, we can just count the
number of levels (blocks). What would one do with e.g. a nested or
crossed design? (Perhaps the answer is "don't use a likelihood ratio
test to evaluate the significance of fixed effects".)
4. Does anyone know of any evidence (in either direction) that the
"boundary" problems that apply to the likelihood ratio test (e.g. Self
and Liang 1987) also apply to information criteria comparisons of
models with and without random effects? I would expect so, since the
derivations of the AIC involve Taylor expansions around the
null-hypothesis parameters ...
5. It's common sense that estimating the variance of a random effect
from a small number of levels (e.g. less than 5) should be dicey, and
that one might in this case want to treat the parameter as a fixed
effect (regardless of its philosophical/experimental design status).
For small numbers of levels I would expect (?) that the answers MIGHT
be similar -- among other things the difference between df=1 and
df=(n-1) would be small. But ... is there a good discussion of this
in print somewhere? (Crawley mentions this on p. 670 of "Statistical
Computing", but without justification.)
lme4-specific questions:
6. Behavior of glmer: Does glmer really use AGQ, or just Laplace?
Both? pp. 28-32 of the "Implementation" vignette in lme4 suggest that
a Laplace approximation is used, but I can't figure out whether this
is an additional approximation on top of the AGQ/Laplace approximation
of the integral over the random effects used in "ordinary" LMM. When
I fit a GLMM with the different methods, the fitted objects are not
identical but all the coefficients seem to be. (I have poked at the
code a bit but been unable to answer this question for myself
... sorry ...)
(The glmmML package claims to fit via Laplace or Gauss-Hermite
quadrature (with non-adaptive, but adjustable, number of quad points
- -- so it's at least theoretically possible?)
library(lme4)
set.seed(1001)
f = factor(rep(1:10,each=10))
zb = rnorm(1:10,sd=2) ## block effects
x = runif(100)
eta = 2*x+zb[f]+rnorm(100)
y = rpois(100,exp(eta))
g1 = glmer(y~x+(1|f),family="poisson",method="Laplace")
g2 = glmer(y~x+(1|f),family="poisson",method="AGQ")
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIIc1Lc5UpGjwzenMRAr4uAJ90myt79pJZCa1a801FkxHRnAHYdgCfUYy+
P0ljXHs4lt8aTwpWKncRkBg=
=nd22
-----END PGP SIGNATURE-----
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
==================================== Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Universit? di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612 http://dssm.unipa.it/vmuggeo
On 08/05/2008, at 1:39 AM, Ben Bolker wrote:
1. Is there a published justification somewhere for Lynn Eberly's ( http://www.biostat.umn.edu/~lynn/ph7430/class.html ) statement that df adjustments are largely irrelevant if number of blocks>25 ?
Actually denominator df > 25. This seems to derive from t distributions with df greater than 25 all being much the same, in fact close to a normal distribution. In reality variations in the data from normality are more important.
2. What determines the asymptotic performance of the LRT (likelihood ratio test) for comparison of fixed effects, which is known to be poor for "small" sample sizes? Is it the number of random-effects levels (as stated by Agresti 2002, p. 520), or is it the number of levels of the fixed effect relative to the total number of data points (as stated by Pinheiro and Bates 2000, pp. 87-89)? (The example given by PB2000 from Littell et al. 1996 is a test of a treatment factor with 15 levels, in a design with 60 observations and 15 blocks. Agresti's statement would imply that one would still be in trouble if the total number of observations increased to 600 [because # blocks is still small], where PB2000 would imply that the LRT would be OK in this limit. (A small experiment with the simulate.lme() example given on PB2000 p. 89 suggests that increasing the sample size 10-fold with the same number of blocks DOES make the LRT OK ... but I would need to do this a bit more carefully to be sure.) (Or is this a difference between the linear and generalized linear case?)
This is probably dependent on whether the comparisons are within- or between- block. The PBIB has lots of within- block comparisons so increasing block size will tend to make things asymptotic. Try blocks where all within a block receive the same treatment and see how much increasing block size helps.
3. For multi-level models (nested, certainly crossed), how would one count the "number of random-effects levels" to quantify the 'sample size' above? With a single random effect, we can just count the number of levels (blocks). What would one do with e.g. a nested or crossed design? (Perhaps the answer is "don't use a likelihood ratio test to evaluate the significance of fixed effects".) 4. Does anyone know of any evidence (in either direction) that the "boundary" problems that apply to the likelihood ratio test (e.g. Self and Liang 1987) also apply to information criteria comparisons of models with and without random effects? I would expect so, since the derivations of the AIC involve Taylor expansions around the null-hypothesis parameters ...
This is a good question. For choosing the number of classes for mixture models it has been shown that BIC fails theoretically but works well in practice (proven with simulations) and compares well to results from parametric bootstrapping of the LRT. Some simulations for random effects would be interesting. Ken
1 day later
On Wed, May 7, 2008 at 10:39 AM, Ben Bolker <bolker at ufl.edu> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear sig-mixed readers,
~ Some of my students and I are foolishly attempting to write a review
of GLMMs for an ecology/evolution audience. Realizing that this is a
huge, gnarly, and not-completely-understood subject (even for the
experts -- see fortune("mixed")), we're trying to provide as much
non-technical background and guidance as can be squeezed into a
reasonable sized journal article ...
~ We've run into quite a few questions we have been unable to answer
... perhaps because no-one knows the answers, or because we're looking
in the wrong places. We thought we might impose on the generosity of
the list: if this feels ridiculous, please just ignore this message.
Feedback ranging from "this is true, but I don't know of a published
source" to "this just isn't true" would be useful. We are aware of the
deeper problems of focusing on p-values and degrees of freedom -- we
will encourage readers to focus on estimating effect sizes and
confidence limits -- but we would also like to answer some of these
questions for them, if we can.
~ Ben Bolker
1. Is there a published justification somewhere for Lynn Eberly's
( http://www.biostat.umn.edu/~lynn/ph7430/class.html ) statement that df
adjustments are largely irrelevant if number of blocks>25 ?
2. What determines the asymptotic performance of the LRT (likelihood
ratio test) for comparison of fixed effects, which is known to be poor
for "small" sample sizes? Is it the number of random-effects levels
(as stated by Agresti 2002, p. 520), or is it the number of levels of
the fixed effect relative to the total number of data points (as
stated by Pinheiro and Bates 2000, pp. 87-89)? (The example given by
PB2000 from Littell et al. 1996 is a test of a treatment factor with
15 levels, in a design with 60 observations and 15 blocks. Agresti's
statement would imply that one would still be in trouble if the total
number of observations increased to 600 [because # blocks is still
small], where PB2000 would imply that the LRT would be OK in this
limit. (A small experiment with the simulate.lme() example given on
PB2000 p. 89 suggests that increasing the sample size 10-fold with the
same number of blocks DOES make the LRT OK ... but I would need to do
this a bit more carefully to be sure.) (Or is this a difference
between the linear and generalized linear case?)
3. For multi-level models (nested, certainly crossed), how would one
count the "number of random-effects levels" to quantify the 'sample
size' above? With a single random effect, we can just count the
number of levels (blocks). What would one do with e.g. a nested or
crossed design? (Perhaps the answer is "don't use a likelihood ratio
test to evaluate the significance of fixed effects".)
4. Does anyone know of any evidence (in either direction) that the
"boundary" problems that apply to the likelihood ratio test (e.g. Self
and Liang 1987) also apply to information criteria comparisons of
models with and without random effects? I would expect so, since the
derivations of the AIC involve Taylor expansions around the
null-hypothesis parameters ...
5. It's common sense that estimating the variance of a random effect
from a small number of levels (e.g. less than 5) should be dicey, and
that one might in this case want to treat the parameter as a fixed
effect (regardless of its philosophical/experimental design status).
For small numbers of levels I would expect (?) that the answers MIGHT
be similar -- among other things the difference between df=1 and
df=(n-1) would be small. But ... is there a good discussion of this
in print somewhere? (Crawley mentions this on p. 670 of "Statistical
Computing", but without justification.)
lme4-specific questions:
6. Behavior of glmer: Does glmer really use AGQ, or just Laplace?
Both? pp. 28-32 of the "Implementation" vignette in lme4 suggest that
a Laplace approximation is used, but I can't figure out whether this
is an additional approximation on top of the AGQ/Laplace approximation
of the integral over the random effects used in "ordinary" LMM. When
I fit a GLMM with the different methods, the fitted objects are not
identical but all the coefficients seem to be. (I have poked at the
code a bit but been unable to answer this question for myself
... sorry ...)
To answer this question I must again, I regret, distinguish between the CRAN version of the lme4 package and the R-forge development version of lme4. In the R-forge version the only method for generalized linear mixed models and for nonlinear mixed models is direct optimization of the Laplace approximation to the deviance. One of the Summer of Code projects that Google has funded for the R Foundation (see http://code.google.com/soc/2008/rf/about.html) is implementation of the Adaptive Gauss-Hermite Quadrature approximation to the deviance. That will be implemented in the development version (i.e. the R-forge version) of the lme4 package. AGQ will only be offered for models with a single grouping factor for the random effects. I realize that it is somewhat irritating and confusing to readers of this list to have descriptions of the R-forge version of the package contrasted with the CRAN version. It is natural to expect that the R-forge version should be the version on CRAN. The reason that I have not yet released the R-forge to CRAN is because of problems with the mcmcsamp function in the R-forge version. If I moved the R-forge version to CRAN then code from Harald Baayen's book and probably code from Gelman and Hill's book would no longer work. Software versions can be changed much more readily than can editions of a book. I think there is a way around the problem with mcmcsamp but I won't be able to say for sure until it is coded and tested, which will take time. I don't want to predict exactly how much time - I always manage to underestimate drastically. I do not plan to provide an implementation of penalized quasi-likelihood (PQL) in what is currently the development version and what will become lme4_1.0. PQL for GLMMs and the "Lindstrom-Bates" algorithm for nonlinear mixed models (the approaches are related) are examples of alternating conditional optimization (think of the Gibbs sampler approach with optimization instead of sampling). It is a dangerous practice in that it can result in oscillation between conditional optima. I now prefer direct optimization techniques where the fixed-effects parameters and the variance components are optimized simultaneously. There may be circumstances where PQL is advantageous but usually those are because of over-parameterized models.
(The glmmML package claims to fit via Laplace or Gauss-Hermite quadrature (with non-adaptive, but adjustable, number of quad points - -- so it's at least theoretically possible?)
Yes. I think the adaptive part is important (in fact, I know it is important) and probably more important than the distinction between the Laplace approximation and Gauss-Hermite quadrature. That is, you gain more from using the Laplace approximation at the conditional modes of the random effects (which is the "adaptive" part) than increasing the number of Gauss-Hermite quadrature points at the marginal modes. The tricky bit of AGQ or Laplace is determining the conditional modes and that part is already done.
library(lme4) set.seed(1001) f = factor(rep(1:10,each=10)) zb = rnorm(1:10,sd=2) ## block effects x = runif(100) eta = 2*x+zb[f]+rnorm(100) y = rpois(100,exp(eta)) g1 = glmer(y~x+(1|f),family="poisson",method="Laplace") g2 = glmer(y~x+(1|f),family="poisson",method="AGQ") -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIIc1Lc5UpGjwzenMRAr4uAJ90myt79pJZCa1a801FkxHRnAHYdgCfUYy+ P0ljXHs4lt8aTwpWKncRkBg= =nd22 -----END PGP SIGNATURE-----
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
3 days later
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 | lme4-specific questions: | | 6. Behavior of glmer: Does glmer really use AGQ, or just Laplace? | Both? pp. 28-32 of the "Implementation" vignette in lme4 suggest that | a Laplace approximation is used, but I can't figure out whether this | is an additional approximation on top of the AGQ/Laplace approximation | of the integral over the random effects used in "ordinary" LMM. When | I fit a GLMM with the different methods, the fitted objects are not | identical but all the coefficients seem to be. (I have poked at the | code a bit but been unable to answer this question for myself | ... sorry ...) | |> To answer this question I must again, I regret, distinguish between |> the CRAN version of the lme4 package and the R-forge development |> version of lme4. | |> In the R-forge version the only method for generalized linear mixed |> models and for nonlinear mixed models is direct optimization of the |> Laplace approximation to the deviance. One of the Summer of Code |> projects that Google has funded for the R Foundation (see |> http://code.google.com/soc/2008/rf/about.html) is implementation of |> the Adaptive Gauss-Hermite Quadrature approximation to the deviance. |> That will be implemented in the development version (i.e. the R-forge |> version) of the lme4 package. [snip] ~ Great! ~ A minor feature request: does it make sense to update the documentation and code of glmer to make it clear that it does *not* do AGQ at the moment? I guess that depends how soon you would expect the GSoC code to come online ... |> I realize that it is somewhat irritating and confusing to readers of |> this list to have descriptions of the R-forge version of the package |> contrasted with the CRAN version. It is natural to expect that the |> R-forge version should be the version on CRAN. The reason that I have |> not yet released the R-forge to CRAN is because of problems with the |> mcmcsamp function in the R-forge version. If I moved the R-forge |> version to CRAN then code from Harald Baayen's book and probably code |> from Gelman and Hill's book would no longer work. Software versions |> can be changed much more readily than can editions of a book. I think |> there is a way around the problem with mcmcsamp but I won't be able to |> say for sure until it is coded and tested, which will take time. I |> don't want to predict exactly how much time - I always manage to |> underestimate drastically. ~ If (just hypothetically speaking) I were writing a review that would be published in 6 months or so, do you have a recommendation? (Would you still trust GLMM/mcmcsamp results from the CRAN version?) | (The glmmML package claims to fit via Laplace or Gauss-Hermite | quadrature (with non-adaptive, but adjustable, number of quad points | -- so it's at least theoretically possible?) | |> Yes. I think the adaptive part is important (in fact, I know it is |> important) and probably more important than the distinction between |> the Laplace approximation and Gauss-Hermite quadrature. That is, you |> gain more from using the Laplace approximation at the conditional |> modes of the random effects (which is the "adaptive" part) than |> increasing the number of Gauss-Hermite quadrature points at the |> marginal modes. The tricky bit of AGQ or Laplace is determining the |> conditional modes and that part is already done. ~ OK, I was confused about the distinction in the meaning of "adaptive" (as you pointed out previously on r-help ...) I think I have it now. https://stat.ethz.ch/pipermail/r-help/2007-March/128043.html ~ thanks for your help! ~ Ben Bolker -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIKGYLc5UpGjwzenMRAoBnAJ0R7w6FHL8uTaz3OKmMvJWHByS0tACdEy4x ABU30kHrB/eNoXcHMvJJ4LE= =whoY -----END PGP SIGNATURE-----