Differences in degrees of freedom between a mixed-effects model and a gls model using nlme

Dear list members,

I'm having some problems understanding the difference in degrees of freedom between a mixed-effect model and a gls model, both fitted using the nlme package. I'm used to it being difficult to figure out the 'correct' (if such a thing exists) number of degrees of freedom in mixed-effects models, but in the following very simple example the *mixed-effects model* seems to use the correct degrees of freedom, while the gls model uses too many degrees of freedom.

Here's the example. Basically, I want to test if a mixed-effects model is equivalent to a paired t-test.

# Generate some correlated data
set.seed(1)
n = 10
u = 5 + rnorm(n, sd=1)
x1 = u + rnorm(n)
x2 = u + rnorm(n) + 1

# A paired t-test is, as expected, more powerful than an unpaired one
t.test(x1, x2)              # p = .1407
t.test(x1, x2, paired=TRUE) # p = .0597

The relevant line of output for the paired test is:
t = -2.1532, df = 9, p-value = 0.05972

# Fit a linear mixed-effects model
library(nlme)
l = lme(values~ind, random=~1|id, data=df)
summary(l)

The relevant lines of output are:

               Value Std.Error DF   t-value p-value
indx2       0.617482 0.2867705  9  2.153226  0.0597

So we get the same t-value (to five decimal places - it differs in the last decimal), degrees of freedom and p-value as in the paired t-test. Let's now try to fit a gls model for correlated data. This should in theory be equivalent to doing a paired t-test. We get:

#  Fit a correlated gls model
library(nlme)
l2 = gls(values~ind, correlation=corSymm(form=~1|id), data=df)
summary(l2)

               Value Std.Error   t-value p-value
indx2       0.617482 0.2867703  2.153228  0.0451

The t-value is the same as in the t-test (here to *six* decimal place), but the p-value is too small. The reason is that the test uses twice the degrees of freedom that it should. It uses 18 degrees of freedom:

2*pt(2.153228, 18, lower.tail = FALSE) # 0.04510817

So the mixed-effects model gets it right, while the gls model gets it wrong. But it's actually the *gls* model that should be equivalent to the paired t-test, not the mixed-effects model. We can easily see this by changing the correlation to be negative:

set.seed(2)
n = 10
u = 5 + rnorm(n, sd=1)
x1 = u + rnorm(n)
x2 = 15 - 2*u + rnorm(n) + 1
cor(x1, x2) # -0.50

Unpaired t-test:
t = -0.5052, df = 15.753, p-value = 0.6204

Paired t-test (note that the p-value is *greater* than for the unpaired case):
t = -0.418, df = 9, p-value = 0.6857

Mixed-effects model:

               Value Std.Error DF  t-value p-value
indx2       0.481868 0.9537336  9 0.505244  0.6255

Of course, the mixed-effects model doesn't really fit the data. The random effect variance is estimated to be ~0, so basically this corresponds to an *unpaired* t-test (but one that assumes equal variance). Note that the t-statistic is equal to the t-statistic of the unpaired t-test (though the latter uses 18 degrees of freedom, while this model uses 9 degrees of freedom).

Now, if I've understood everything correctly, the gls model should *exactly* correspond to a paired t-test; it's the same underlying model. And indeed we get the same t-statistic (0.418):

               Value Std.Error  t-value p-value
indx2       0.481868 1.1526947 0.418036  0.6809

But the p-value differs somewhat, since the gls model assumes 18 degrees of freedom, while the t-test (correctly) assumes 9 degrees of freedom.

At least for positive correlations, a mixed-effect random intercept model and a gls model with compound symmetry should be *almost* equivalent, so I see no reason that the gls model can assume that the test statistics has twice as many degrees of freedom as in the mixed-effects model. So why the discrepancy? Shouldn't I trust the degrees of freedom calculations from gls()?