why would using p-values of GLMM for distr other than Gaussian be correct?
Thank you, Ben. That is exactly what I was trying to convey --- one culture is used to asymptotic results, and the other to the finite-size correction.
On Tue, Sep 24, 2013 at 7:39 AM, Ben Bolker <bbolker at gmail.com> wrote:
Pablo Inchausti <pablo.inchausti.f at ...> writes:
Hi Joshua, Thanks for your response. I tend to agree (intuitively) with you that when one has 50,000 observations and 1,000 groups for the variable modelled as random effect, assuming a normal distribution for Wald test =coef/Se(coef) of the fixed effects without bothering about degrees of freedom is reasonable. However, the overwhelming majority of analyses deals with tens or at most a hundred observations and with random effects defined by a factor with a small (but generally greater than 5) number of categories. It is in this (often encountered) context that the discussion of how to count the degrees of freedom for the random effects seems to be critical. This tally of the degrees of freedom lies between two extremes: as one (because only the variance of the normally distributed random effect is estimated) or as the number of categories minus one of the variable modelled as random effect. In many (most?) cases, the assumption regarding the counting of the degrees of freedom does make a difference for evaluating the significance of the fixed effects. The significance tests of the fixed effects requires having the degrees of freedom of the model, which is why the library lme4 does not provide the p-values when family=Gaussian but it does provide them whenever family != Gaussian, which was the question I posed in my mail. Other programs (SAS, Statistica) take a position/assumption about the degrees of freedom of the random effects that is at the very least debatable. DBates and others recommend using Bayesian methods to estimate the p-vales and the Conf Intervals, but the commonly available R functions only work for GLMM with family =Gaussian and with independent random slopes and intercepts. I hope that this mail helps clarify the questions I posed. Cheers Pablo On 23 September 2013 18:50, Joshua Wiley <jwiley.psych at ...> wrote:
Hi Pablo, I think it depends on the assumptions. In theory with the right degrees of freedom, you could fit linear mixed effects models on a smaller sample reasonably. There are no degrees of freedom typically for glms, and GLMMs follow suit. Things like logistic regression rely on large sample theory---you have a big enough sample degrees of freedom are effectively infinite---the parameters are normally distributed and a z test is fine. The same would hold for linear mixed models. If you had say, 50000 observations from 1000 groups, p values assuming z = b/se ~ Gaussian is pretty sensible. Cheers, Joshua
[snip snip snip] Just to amplify Joshua's answer: I really think that the reason that p values are shown for GLMMs and not LMMs is cultural. The classic mixed model ANOVA world is (perhaps appropriately) somewhat obsessed with degrees of freedom, which translates to wanting to know what the real units of replication are so that proper inference can be done; the LMM concern inherits from this. On the GLM(M) side, the *culture* is to rely on asymptotic theory. There is theory about finite-size corrections for GLMs (without random effects), under the rubric of "Bartlett corrections", but it's not very widely known or used. Thus, summary.lm (for example) reports t statistics (finite-size-corrected) while summary.glm reports Z statistics (asymptotic) ... There's more discussion of this at http://glmm.wikidot.com/faq#df : I might add a sentence or two explaining the cultural context. Ben
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com