Dear list, As I have read (Bolker et al. 2009 TREE), the Wald Z test is only appropriate for GLMMs in cases without overdispersion. Assuming we use family=poisson with lmer and tackle overdispersion by incorporating an individual random effect AND this adequately "reduces" the overdispersion, is it then OK to use the Wald z test as reported by lmer? In other words, are the p-values reported by lmer in those cases useful/"correct"? Or do they suffer from the usual problems with figuring out the number of parameters used by the random effects? Secondly, is it good practice to judge lmer's capability of "reducing" the overdispersion by summing the squared residuals (pearson) and compare this to a chi square distribution (with N-1 degrees of freedom)? Thankful for any comments or suggestions, Jens ?str?m
glmer Z-test with individual random effects
7 messages · Jens Åström, Ben Bolker, Jarrod Hadfield +2 more
On 11/11/2010 09:58 AM, Jens ?str?m wrote:
Dear list, As I have read (Bolker et al. 2009 TREE), the Wald Z test is only appropriate for GLMMs in cases without overdispersion. Assuming we use family=poisson with lmer and tackle overdispersion by incorporating an individual random effect AND this adequately "reduces" the overdispersion, is it then OK to use the Wald z test as reported by lmer? In other words, are the p-values reported by lmer in those cases useful/"correct"? Or do they suffer from the usual problems with figuring out the number of parameters used by the random effects?
They are equivalent to assuming an infinite/large 'denominator degrees of freedom'. If you have a large sample size (both a large number of total samples relative to the number of parameters, and a large number of random-effects levels/blocks) then this should be reasonable -- if not, then yes, the 'usual problems with figuring out the number of parameters' is relevant. On the other hand, if you're willing to assume that the sample size is large, then likelihood ratio rests (anova(model1,model2)) are probably better than the Wald tests anyway.
Secondly, is it good practice to judge lmer's capability of "reducing" the overdispersion by summing the squared residuals (pearson) and compare this to a chi square distribution (with N-1 degrees of freedom)?
I would say this is reasonable, although again it's a rough guide because the true degrees of freedom are a bit fuzzy -- it should probably be at most N-(fixed effect degrees of freedom)? Would be happy to hear any conflicting opinions. Ben Bolker
The Wald tests (as represented in glm and glmer output of z-statistics and p-values) are approximate for both glm models and glmm models with non-identity links. The approximation can fail badly if the link is highly non-linear over a region of the response that is relevant for a parameter of interest. The Hauck-Donner phenomenon, where the z-statistic decreases as the effect estimate increases, is an extreme example. (This happens, e.g., as one of the levels being compared gives a fitted value that moves close to a binomial proportion of 0 or 1, or close to a Poisson estimate of 0.) The additional complication for a glmm is that the SE may have two components -- e.g., a poisson or binomial error, and a random normal error that is added on the scale of the linear predictor. This random normal error somewhat alleviates variance change effects (including Hauck-Donner) that result from non-linearity in the link, while adding uncertainty to the SE estimate. If the contribution from the glm family error term is largish relative to contribution from the random normal error (> 3 or 4 times as large?), treating the z-statistic as normal may not in many circumstances be too unreasonable, even if the relevant degrees of freedom are as small as maybe 4 (e.g., where the test is for consistency across 5 locations). On data where I seem to have this situation, fairly consistently across a number of responses, I initially used MCMCglmm(). I was concerned about the contribution from the random normal error (including a contribution from an observation level random effect term). I found I hat I had to choose a somewhat informative prior (inverse Wishart with V=1, nu=0.002) to consistently get convergence. With this prior, the MCMCglmm credible intervals were remarkably close to glmer confidence intervals, treating the z-statistics as normal. No doubt the effect of the chosen prior is to insist that the true random normal error variance is fairly close to the variance as estimated by glmer(). A frustration (for me, at least) with using MCMCglmm is that I do not know just what MCMCglmm() is doing in this respect, short of doing some careful investigative exploration of the MCMC simulation results (which is an after-the-event check, where I'd like to know before the event). Comments, Jarrod? All this is to emphasise that we are not in the arena, unless the relevant degrees of freedom are large, of precise science. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 12/11/2010, at 2:18 AM, Ben Bolker wrote:
On 11/11/2010 09:58 AM, Jens ?str?m wrote:
Dear list, As I have read (Bolker et al. 2009 TREE), the Wald Z test is only appropriate for GLMMs in cases without overdispersion. Assuming we use family=poisson with lmer and tackle overdispersion by incorporating an individual random effect AND this adequately "reduces" the overdispersion, is it then OK to use the Wald z test as reported by lmer? In other words, are the p-values reported by lmer in those cases useful/"correct"? Or do they suffer from the usual problems with figuring out the number of parameters used by the random effects?
They are equivalent to assuming an infinite/large 'denominator degrees of freedom'. If you have a large sample size (both a large number of total samples relative to the number of parameters, and a large number of random-effects levels/blocks) then this should be reasonable -- if not, then yes, the 'usual problems with figuring out the number of parameters' is relevant. On the other hand, if you're willing to assume that the sample size is large, then likelihood ratio rests (anova(model1,model2)) are probably better than the Wald tests anyway.
Secondly, is it good practice to judge lmer's capability of "reducing" the overdispersion by summing the squared residuals (pearson) and compare this to a chi square distribution (with N-1 degrees of freedom)?
I would say this is reasonable, although again it's a rough guide because the true degrees of freedom are a bit fuzzy -- it should probably be at most N-(fixed effect degrees of freedom)? Would be happy to hear any conflicting opinions. Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi John,
I'm not sure which part of the MCMCglmm routine is unclear but in a
nut shell:
The data are distributed according to the specified distribution (e.g.
Poisson ) with the dsitribution parameter equal to the linear
predictor (nu) on the inverse link scale:
y ~ Pois(exp(nu))
A linear (mixed) model is proposed for the linear predictor:
nu ~ N(Xb+Zu, I*Ve)
In all models a "units" term is included which is the same as an
observational-level random effect. This is the residual term in the
linear part of the model.
Priors are passed as a list of the form:
prior=list(B=list(),
R=list(),
G=list(G1=list(),
G2=list()))
where B is the prior for the fixed effects, R is the prior for Ve (or
possibly a residual covariance matrix depending on the model) and the
elements of G are the priors for each of the random effect
(co)variances.
The prior for b is multivariate normal Pr(b) ~ N(B$mu, B$V)
The prior for Ve is inverse-Wishart Pr(Ve) ~ IW(R$nu, R$V*R$nu) where
the first term is the degree of belief and R$V*R$nu is the scale matrix
The prior for the random effects variances (Vg) are set in the same
way as Ve, although its possible to also fit parameter expanded priors.
When the variance term is scalar (rather than a covariance matrix)
then the inverse-Wishart and inverse-gamma are the same and IW(R$nu, R
$V*R$nu) = IG(R$nu/2, R$V*R$nu/2) where the two parameters are the
shape and scale. Your specification R=list(V=1, nu=0.002) is
therefore IG(0.001, 0.001) which used to be used a lot (and probably
still is) as a "weak" prior by users of WinBUGS. The REML estimator
for Ve should be equal to the posterior mode in MCMCglmm when the
improper prior R=list(V=0, nu=-2) is used. 0 will not be accepted by
MCMCglmm so you would have to use something small (e.g. 1e-6). The
prior is improper so you have to be careful when using it, and
although the point estimate may have some useful properties (the same
as the REML estimate) the full posterior distribution may not be ideal
with small sample sizes.
The sampling scheme is:
update nu using MH updates or slice sampling
Gibbs sample b and u together
Gibbs sample (co)variance components sequentially.
For other models such as ordinal regression or simultaneous/recursive
models there are other schemes also, but these are quite specialised.
Cheers,
Jarrod
On 11 Nov 2010, at 23:45, John Maindonald wrote:
The Wald tests (as represented in glm and glmer output of z-statistics and p-values) are approximate for both glm models and glmm models with non-identity links. The approximation can fail badly if the link is highly non-linear over a region of the response that is relevant for a parameter of interest. The Hauck-Donner phenomenon, where the z-statistic decreases as the effect estimate increases, is an extreme example. (This happens, e.g., as one of the levels being compared gives a fitted value that moves close to a binomial proportion of 0 or 1, or close to a Poisson estimate of 0.) The additional complication for a glmm is that the SE may have two components -- e.g., a poisson or binomial error, and a random normal error that is added on the scale of the linear predictor. This random normal error somewhat alleviates variance change effects (including Hauck-Donner) that result from non-linearity in the link, while adding uncertainty to the SE estimate. If the contribution from the glm family error term is largish relative to contribution from the random normal error (> 3 or 4 times as large?), treating the z- statistic as normal may not in many circumstances be too unreasonable, even if the relevant degrees of freedom are as small as maybe 4 (e.g., where the test is for consistency across 5 locations). On data where I seem to have this situation, fairly consistently across a number of responses, I initially used MCMCglmm(). I was concerned about the contribution from the random normal error (including a contribution from an observation level random effect term). I found I hat I had to choose a somewhat informative prior (inverse Wishart with V=1, nu=0.002) to consistently get convergence. With this prior, the MCMCglmm credible intervals were remarkably close to glmer confidence intervals, treating the z-statistics as normal. No doubt the effect of the chosen prior is to insist that the true random normal error variance is fairly close to the variance as estimated by glmer(). A frustration (for me, at least) with using MCMCglmm is that I do not know just what MCMCglmm() is doing in this respect, short of doing some careful investigative exploration of the MCMC simulation results (which is an after-the-event check, where I'd like to know before the event). Comments, Jarrod? All this is to emphasise that we are not in the arena, unless the relevant degrees of freedom are large, of precise science. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 12/11/2010, at 2:18 AM, Ben Bolker wrote:
On 11/11/2010 09:58 AM, Jens ?str?m wrote:
Dear list, As I have read (Bolker et al. 2009 TREE), the Wald Z test is only appropriate for GLMMs in cases without overdispersion. Assuming we use family=poisson with lmer and tackle overdispersion by incorporating an individual random effect AND this adequately "reduces" the overdispersion, is it then OK to use the Wald z test as reported by lmer? In other words, are the p-values reported by lmer in those cases useful/"correct"? Or do they suffer from the usual problems with figuring out the number of parameters used by the random effects?
They are equivalent to assuming an infinite/large 'denominator degrees of freedom'. If you have a large sample size (both a large number of total samples relative to the number of parameters, and a large number of random-effects levels/blocks) then this should be reasonable -- if not, then yes, the 'usual problems with figuring out the number of parameters' is relevant. On the other hand, if you're willing to assume that the sample size is large, then likelihood ratio rests (anova(model1,model2)) are probably better than the Wald tests anyway.
Secondly, is it good practice to judge lmer's capability of "reducing" the overdispersion by summing the squared residuals (pearson) and compare this to a chi square distribution (with N-1 degrees of freedom)?
I would say this is reasonable, although again it's a rough guide because the true degrees of freedom are a bit fuzzy -- it should probably be at most N-(fixed effect degrees of freedom)? Would be happy to hear any conflicting opinions. Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
HI Jarrod - The mechanics of what is happening are not my issue. What bothers me is that I do not have any intuitive sense of how the choice of prior may be constraining the variance estimates. Is it for example possible to say that the effect is comparable to adding some number of degrees of freedom to the variance estimate? What is the balance between what comes from the prior, and what derives from the data? John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 12/11/2010, at 10:51 PM, Jarrod Hadfield wrote:
Hi John,
I'm not sure which part of the MCMCglmm routine is unclear but in a nut shell:
The data are distributed according to the specified distribution (e.g. Poisson ) with the dsitribution parameter equal to the linear predictor (nu) on the inverse link scale:
y ~ Pois(exp(nu))
A linear (mixed) model is proposed for the linear predictor:
nu ~ N(Xb+Zu, I*Ve)
In all models a "units" term is included which is the same as an observational-level random effect. This is the residual term in the linear part of the model.
Priors are passed as a list of the form:
prior=list(B=list(),
R=list(),
G=list(G1=list(),
G2=list()))
where B is the prior for the fixed effects, R is the prior for Ve (or possibly a residual covariance matrix depending on the model) and the elements of G are the priors for each of the random effect (co)variances.
The prior for b is multivariate normal Pr(b) ~ N(B$mu, B$V)
The prior for Ve is inverse-Wishart Pr(Ve) ~ IW(R$nu, R$V*R$nu) where the first term is the degree of belief and R$V*R$nu is the scale matrix
The prior for the random effects variances (Vg) are set in the same way as Ve, although its possible to also fit parameter expanded priors.
When the variance term is scalar (rather than a covariance matrix) then the inverse-Wishart and inverse-gamma are the same and IW(R$nu, R$V*R$nu) = IG(R$nu/2, R$V*R$nu/2) where the two parameters are the shape and scale. Your specification R=list(V=1, nu=0.002) is therefore IG(0.001, 0.001) which used to be used a lot (and probably still is) as a "weak" prior by users of WinBUGS. The REML estimator for Ve should be equal to the posterior mode in MCMCglmm when the improper prior R=list(V=0, nu=-2) is used. 0 will not be accepted by MCMCglmm so you would have to use something small (e.g. 1e-6). The prior is improper so you have to be careful when using it, and although the point estimate may have some useful properties (the same as the REML estimate) the full posterior distribution may not be ideal with small sample sizes.
The sampling scheme is:
update nu using MH updates or slice sampling
Gibbs sample b and u together
Gibbs sample (co)variance components sequentially.
For other models such as ordinal regression or simultaneous/recursive models there are other schemes also, but these are quite specialised.
Cheers,
Jarrod
On 11 Nov 2010, at 23:45, John Maindonald wrote:
The Wald tests (as represented in glm and glmer output of z-statistics and p-values) are approximate for both glm models and glmm models with non-identity links. The approximation can fail badly if the link is highly non-linear over a region of the response that is relevant for a parameter of interest. The Hauck-Donner phenomenon, where the z-statistic decreases as the effect estimate increases, is an extreme example. (This happens, e.g., as one of the levels being compared gives a fitted value that moves close to a binomial proportion of 0 or 1, or close to a Poisson estimate of 0.) The additional complication for a glmm is that the SE may have two components -- e.g., a poisson or binomial error, and a random normal error that is added on the scale of the linear predictor. This random normal error somewhat alleviates variance change effects (including Hauck-Donner) that result from non-linearity in the link, while adding uncertainty to the SE estimate. If the contribution from the glm family error term is largish relative to contribution from the random normal error (> 3 or 4 times as large?), treating the z-statistic as normal may not in many circumstances be too unreasonable, even if the relevant degrees of freedom are as small as maybe 4 (e.g., where the test is for consistency across 5 locations). On data where I seem to have this situation, fairly consistently across a number of responses, I initially used MCMCglmm(). I was concerned about the contribution from the random normal error (including a contribution from an observation level random effect term). I found I hat I had to choose a somewhat informative prior (inverse Wishart with V=1, nu=0.002) to consistently get convergence. With this prior, the MCMCglmm credible intervals were remarkably close to glmer confidence intervals, treating the z-statistics as normal. No doubt the effect of the chosen prior is to insist that the true random normal error variance is fairly close to the variance as estimated by glmer(). A frustration (for me, at least) with using MCMCglmm is that I do not know just what MCMCglmm() is doing in this respect, short of doing some careful investigative exploration of the MCMC simulation results (which is an after-the-event check, where I'd like to know before the event). Comments, Jarrod? All this is to emphasise that we are not in the arena, unless the relevant degrees of freedom are large, of precise science. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 12/11/2010, at 2:18 AM, Ben Bolker wrote:
On 11/11/2010 09:58 AM, Jens ?str?m wrote:
Dear list, As I have read (Bolker et al. 2009 TREE), the Wald Z test is only appropriate for GLMMs in cases without overdispersion. Assuming we use family=poisson with lmer and tackle overdispersion by incorporating an individual random effect AND this adequately "reduces" the overdispersion, is it then OK to use the Wald z test as reported by lmer? In other words, are the p-values reported by lmer in those cases useful/"correct"? Or do they suffer from the usual problems with figuring out the number of parameters used by the random effects?
They are equivalent to assuming an infinite/large 'denominator degrees of freedom'. If you have a large sample size (both a large number of total samples relative to the number of parameters, and a large number of random-effects levels/blocks) then this should be reasonable -- if not, then yes, the 'usual problems with figuring out the number of parameters' is relevant. On the other hand, if you're willing to assume that the sample size is large, then likelihood ratio rests (anova(model1,model2)) are probably better than the Wald tests anyway.
Secondly, is it good practice to judge lmer's capability of "reducing" the overdispersion by summing the squared residuals (pearson) and compare this to a chi square distribution (with N-1 degrees of freedom)?
I would say this is reasonable, although again it's a rough guide because the true degrees of freedom are a bit fuzzy -- it should probably be at most N-(fixed effect degrees of freedom)? Would be happy to hear any conflicting opinions. Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Dear Ben, Can I just check, is it necessary to have both a large total sample size and a large number of levels of your random effect(s) for a likelihood ratio test to be robust? Or does the large number of levels requirement apply only to the Wald test? I'm referring to the part of your answer below. Thanks, Andy. "They are equivalent to assuming an infinite/large 'denominator degrees of freedom'. If you have a large sample size (both a large number of total samples relative to the number of parameters, and a large number of random-effects levels/blocks) then this should be reasonable -- if not, then yes, the 'usual problems with figuring out the number of parameters' is relevant. On the other hand, if you're willing to assume that the sample size is large, then likelihood ratio rests (anova(model1,model2)) are probably better than the Wald tests anyway."
Hi, I think this is a hard question to answer for complex models (otherwise MCMC would be redundant) but in simple situations the inverse-Wishart prior can be thought of as adding nu effects with known squared deviation V. For example, we could have 5 levels of some factor (rfac) and observe 10 data with pairs of observations associated with each level of rfac: rfac<-gl(5,2) e.rfac<-rnorm(5, 0, sqrt(0.1)) y<-rnorm(10)+e.rfac[rfac] dat<-data.frame(y=y, rfac=rfac) Here the intercept is 0.0, the residual variance is 1.0 and the variance of effects associated with rfac is 0.1. I'll call this last variance Vg. We can fix the posterior for all parameters at their true values except Vg which has an IW prior with V=1 and nu=1: prior1=list(B=list(mu=0, V=1e-6), R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1))) m1<-MCMCglmm(y~1, random=~rfac, data=dat, prior=prior1) We could also observe an additional 1000 data points associated with a 6th level of rfac which has an effect of 1.0: dat2<-data.frame(y=c(y,rnorm(1000)+1), rfac=c(rfac,rep(6, 1000))) and analyse as before but with a flat prior (i.e. nu=0): prior2=list(B=list(mu=0, V=1e-6), R=list(V=1, fix=1), G=list(G1=list(V=1, nu=0))) m2<-MCMCglmm(y~1, random=~rfac, data=dat, prior=prior2) The posterior distribution for Vg should be (almost) the same in both cases plot(density(m1$VCV[,1])) lines(density(m2$VCV[,1]), col="red") With multi-parameter models, or quantities that are functions of two or more parameters, then this interpretation can be misleading. However, in many instances the idea does seem to serve as a good heuristic. The danger of course is thinking how can adding 1 prior effect (i.e. nu =1) make that much difference (compared to nu=0, say). It can when a) factor levels are poorly replicated b) replication within factor levels is poor and c) when prior V conflicts with the actual value of Vg. Cheers, Jarrod
On 13 Nov 2010, at 00:42, John Maindonald wrote:
HI Jarrod - The mechanics of what is happening are not my issue. What bothers me is that I do not have any intuitive sense of how the choice of prior may be constraining the variance estimates. Is it for example possible to say that the effect is comparable to adding some number of degrees of freedom to the variance estimate? What is the balance between what comes from the prior, and what derives from the data? John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 12/11/2010, at 10:51 PM, Jarrod Hadfield wrote:
Hi John,
I'm not sure which part of the MCMCglmm routine is unclear but in a
nut shell:
The data are distributed according to the specified distribution
(e.g. Poisson ) with the dsitribution parameter equal to the linear
predictor (nu) on the inverse link scale:
y ~ Pois(exp(nu))
A linear (mixed) model is proposed for the linear predictor:
nu ~ N(Xb+Zu, I*Ve)
In all models a "units" term is included which is the same as an
observational-level random effect. This is the residual term in
the linear part of the model.
Priors are passed as a list of the form:
prior=list(B=list(),
R=list(),
G=list(G1=list(),
G2=list()))
where B is the prior for the fixed effects, R is the prior for Ve
(or possibly a residual covariance matrix depending on the model)
and the elements of G are the priors for each of the random effect
(co)variances.
The prior for b is multivariate normal Pr(b) ~ N(B$mu, B$V)
The prior for Ve is inverse-Wishart Pr(Ve) ~ IW(R$nu, R$V*R$nu)
where the first term is the degree of belief and R$V*R$nu is the
scale matrix
The prior for the random effects variances (Vg) are set in the same
way as Ve, although its possible to also fit parameter expanded
priors.
When the variance term is scalar (rather than a covariance matrix)
then the inverse-Wishart and inverse-gamma are the same and IW(R
$nu, R$V*R$nu) = IG(R$nu/2, R$V*R$nu/2) where the two parameters
are the shape and scale. Your specification R=list(V=1, nu=0.002)
is therefore IG(0.001, 0.001) which used to be used a lot (and
probably still is) as a "weak" prior by users of WinBUGS. The REML
estimator for Ve should be equal to the posterior mode in MCMCglmm
when the improper prior R=list(V=0, nu=-2) is used. 0 will not be
accepted by MCMCglmm so you would have to use something small (e.g.
1e-6). The prior is improper so you have to be careful when using
it, and although the point estimate may have some useful properties
(the same as the REML estimate) the full posterior distribution may
not be ideal with small sample sizes.
The sampling scheme is:
update nu using MH updates or slice sampling
Gibbs sample b and u together
Gibbs sample (co)variance components sequentially.
For other models such as ordinal regression or simultaneous/
recursive models there are other schemes also, but these are quite
specialised.
Cheers,
Jarrod
On 11 Nov 2010, at 23:45, John Maindonald wrote:
The Wald tests (as represented in glm and glmer output of z- statistics and p-values) are approximate for both glm models and glmm models with non-identity links. The approximation can fail badly if the link is highly non-linear over a region of the response that is relevant for a parameter of interest. The Hauck-Donner phenomenon, where the z-statistic decreases as the effect estimate increases, is an extreme example. (This happens, e.g., as one of the levels being compared gives a fitted value that moves close to a binomial proportion of 0 or 1, or close to a Poisson estimate of 0.) The additional complication for a glmm is that the SE may have two components -- e.g., a poisson or binomial error, and a random normal error that is added on the scale of the linear predictor. This random normal error somewhat alleviates variance change effects (including Hauck-Donner) that result from non-linearity in the link, while adding uncertainty to the SE estimate. If the contribution from the glm family error term is largish relative to contribution from the random normal error (> 3 or 4 times as large?), treating the z- statistic as normal may not in many circumstances be too unreasonable, even if the relevant degrees of freedom are as small as maybe 4 (e.g., where the test is for consistency across 5 locations). On data where I seem to have this situation, fairly consistently across a number of responses, I initially used MCMCglmm(). I was concerned about the contribution from the random normal error (including a contribution from an observation level random effect term). I found I hat I had to choose a somewhat informative prior (inverse Wishart with V=1, nu=0.002) to consistently get convergence. With this prior, the MCMCglmm credible intervals were remarkably close to glmer confidence intervals, treating the z-statistics as normal. No doubt the effect of the chosen prior is to insist that the true random normal error variance is fairly close to the variance as estimated by glmer(). A frustration (for me, at least) with using MCMCglmm is that I do not know just what MCMCglmm() is doing in this respect, short of doing some careful investigative exploration of the MCMC simulation results (which is an after-the-event check, where I'd like to know before the event). Comments, Jarrod? All this is to emphasise that we are not in the arena, unless the relevant degrees of freedom are large, of precise science. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 12/11/2010, at 2:18 AM, Ben Bolker wrote:
On 11/11/2010 09:58 AM, Jens ?str?m wrote:
Dear list, As I have read (Bolker et al. 2009 TREE), the Wald Z test is only appropriate for GLMMs in cases without overdispersion. Assuming we use family=poisson with lmer and tackle overdispersion by incorporating an individual random effect AND this adequately "reduces" the overdispersion, is it then OK to use the Wald z test as reported by lmer? In other words, are the p-values reported by lmer in those cases useful/"correct"? Or do they suffer from the usual problems with figuring out the number of parameters used by the random effects?
They are equivalent to assuming an infinite/large 'denominator degrees of freedom'. If you have a large sample size (both a large number of total samples relative to the number of parameters, and a large number of random-effects levels/blocks) then this should be reasonable -- if not, then yes, the 'usual problems with figuring out the number of parameters' is relevant. On the other hand, if you're willing to assume that the sample size is large, then likelihood ratio rests (anova(model1,model2)) are probably better than the Wald tests anyway.
Secondly, is it good practice to judge lmer's capability of "reducing" the overdispersion by summing the squared residuals (pearson) and compare this to a chi square distribution (with N-1 degrees of freedom)?
I would say this is reasonable, although again it's a rough guide because the true degrees of freedom are a bit fuzzy -- it should probably be at most N-(fixed effect degrees of freedom)? Would be happy to hear any conflicting opinions. Ben Bolker
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.