Differences in degrees of freedom between a mixed-effects model and a gls model using nlme
I don't want to derail this thread entirely, but it does make me wonder: Are people really concerned about calculating the "right" degrees of freedom in their applications anyway? I have pretty much stopped worrying about the software cleverly figuring out what the right dfs are, as I hardly ever deal with situations where there is a clear and correct answer to that question -- even in the designed experiments I see, unbalancedness creeps in in various ways, the most obvious one being missing data due to attrition (in Karl's example, there is of course a clear answer, but my question is more general). I am sure that the type of applications one deals with has an influence on this matter. If you see nicely designed experiments with balanced data, getting the dfs right might seem like an important concern. Or if sample sizes are small (as in the number of individuals and/or number of repeated measurements), then it may matter whether the dfs are 10 or 100 for the conclusions you draw from a test (which, in the end, is then based, at least partly, on the p-value the software throws at you). But as far as I am concerned, I constantly (and grudgingly, with a lot of wishful thinking) need to rely on the asymptotic behavior of the estimates, standard errors, and test statistics every which way I turn anyway. Whether the dfs are 10, 40.5682..., or 100 is one of my least pressing concerns. If the conclusion doesn't pass the interocular traumatization test, I don't have much faith in it anyway. I know that this has come up before, http://glmm.wikidot.com/faq discusses this as well, and the fact that lme4 doesn't provide p-values is, in essence, a statement in the same direction, but I am just curious about other people's opinion on this. Best, Wolfgang -- Wolfgang Viechtbauer, Ph.D., Statistician Department of Psychiatry and Psychology School for Mental Health and Neuroscience Faculty of Health, Medicine, and Life Sciences Maastricht University, P.O. Box 616 (VIJV1) 6200 MD Maastricht, The Netherlands +31 (43) 388-4170 | http://www.wvbauer.com
-----Original Message----- From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r- project.org] On Behalf Of Ben Bolker Sent: Monday, February 09, 2015 05:36 To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Differences in degrees of freedom between a mixed-effects model and a gls model using nlme Ken Beath <ken.beath at ...> writes:
All 3 (paired t-test, mixed effect and gls with compound symmetry) are fitting the same model, and so should give the same result. That is
what
you see with the first example. The gls model is not getting it wrong except for the df. For the second the 3 model results should again be the same. I'm not certain why but it may be numerical. Even though the data come from a model that isn't correct for the fitting that should be irrelevant, it is the data that produce the model fit not the model that produces the data. Possibly estimates of the correlation are poor when there is little correlation, and that flows through to the mixed effects and gls results. The relationship to the unpaired t-test is probably irrelevant. Note
also
that the default for the t.test is unequal variances whereas for a
mixed
model it is equal variances. The df for gls is obviously in a sense a bug. Getting the df for a
mixed
model isn't easy. Here we have a nice simple correlation structure and there is an obvious correct answer, but usually there isn't one. If the model assumed uncorrelated data then the gls df would be correct, so it
is
necessary for the software to work out what is going on. Using
parametric
bootstrapping to determine the underlying distribution seems a better method if accuracy is important. Ken
For what it's worth you can easily see what gls() is doing to
get its df, and confirm that it's naive, by printing nlme:::summary.gls:
tTable[, "p-value"] <- 2 * pt(-abs(tTable[, "t-value"]),
dims$N - dims$p)
For what it's worth, I've found that the df calculations used by
lme() often fail quite badly for random-slopes models ... it's often
really hard to guess, even for simpler designs (i.e. where there
really is a precise correspondence with an F distribution -- no
correlation
structures or lack of balance or crossed random effects).
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models