Including an autocorrelation term dramatically reduces random variance (lme)

Thu, May 7, 2015 1:08 AM

Dear David,

Please keep the mailing list in cc.

Correlation structures in nlme work on the residuals conditional on the
random effects. This implies that residuals from different levels of the
random effects are assumed to be independent (not correlated). Some of the
information can be described by both the random effect and the correlation
structure. In fact a random intercept is equivalent with a compound
symmetry correlation structure.

You have only 3 observations per random effect level. Then it is hard to
make the difference between the average of within the group (the random
effect) and the correlated residuals. In such cases the shrinkage kick in
very hard, reducing the random effect variance strongly.

IMHO you need choose a model than matches the design. The design dictates
that you need a random effect of fish. This leads to 3 per fish. 3
observations is not enough to estimate a AR1 correlation. Since you have
only 3 observations is doesn't make sense to look at 15 lags. You only have
2...

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-05-06 14:52 GMT+02:00 David Villegas R?os <chirleu at gmail.com>:

Dear Thierry, thank you for your kind response.

I understand that the residual variance must be different between the two
models (with and without autocorrelation term). However, as you can see in
my original post, the main change (3 orders of magnitude) is in the random
variance (the between-individuals variance). I don't understand this. It
seems that the autocorrelation term "eats" all the variance previously
explained by the individual ID...

In addition, it makes sense of course to leave always the random ID in the
model. But to me, the only way to test if the random variance (and
therefore the repeatability) is different from zero, is to test if the
model needs the random ID (AIC, anova,...between model with and model
without random ID). Of course I could calculate the CI for the
repeatability using parametric bootstrapping, but this seems a difficult
task for such complex models according to this previous post:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q2/023385.html

Best,

David



2015-05-06 14:21 GMT+02:00 Thierry Onkelinx <thierry.onkelinx at inbo.be>:

Dear David,

A model without correlation has $\epsilon_t \sim N(0, \sigma_1)$
AR1 correlation implies: $\epsilon_t = \phi \epsilon_{t-1} + a_t$  with
$\a_t \sim N(0, \sigma_2)$ (see Pinheiro and Bates, 2000). So the residual
variance in this model is not the same thing. It makes sense that
$\sigma_2$ is smaller than $\sigma_1$.

You should takes this into account when calculating the repeatability.

Don't bother testing the need of ID. The design of your experiment
dictates the need to include it in the model.

Weights affect the residual variance somewhat similar as a correlation
structure does. Have a look at the formulas in Pinheiro and Bates (2000).

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-05-06 10:07 GMT+02:00 David Villegas R?os <chirleu at gmail.com>:

Dear list,

My dataset includes 285 individuals for which I have measured one trait
over three months. The traits has been measured at two different time
scales: monthly (maximum 3 replicates per fish) and daily (maximum 90
replicates per fish). For one third of the fish, the trait was measured
in
2012, for another third in 2013 and for the last third in 2014. But I
measured always in the same three months: june, july and august.

My objective is to use a mixed modelling approach to estimate
repeatability
(Vrandom/(Vrandom+Vresidual)) of this trait. I want to account for three
fixed factors: area (two sites), total length (tl) and time. For
comparative purposes, I want to do it for the two temporal scales, i.e.,
one model using month and therefore three replicates per ID, and a
different model using (julian) day and 90 replicates per ID.

My attempts so far for the model with month:

MODEL 1: Mixed model without autocorrelation term


lme1=lme(loghr~area+tl+factor(month2),random=~1|fish,data=hrm3,method="REML")

summary(lme1)

Linear mixed-effects model fit by REML
 Data: hrm3
       AIC      BIC    logLik
  1498.771 1531.537 -742.3854

Random effects:
 Formula: ~1 | fish
                (Intercept) Residual
StdDev:   0.4649101 0.4790193

Fixed effects: loghr ~ area + tl + factor(month2)
                            Value         Std.Error       DF    t-value
   p-value
(Intercept)           -1.1688029 0.15189536 514  -7.694790    0.0000
areatvedestrand -0.9943160 0.06490892 283  -15.318635  0.0000
tl                         -0.0034115 0.00308169 283  -1.107031    0.2692
factor(month2)7 -0.2242556 0.04074810 514  -5.503462     0.0000
factor(month2)8 -0.1269165 0.04245662 514  -2.989322     0.0029

Correlation:
                            (Intr)   artvds   tl         fc(2)7
areatvedestrand -0.224
tl                         -0.943  0.019
factor(month2)7 -0.121  0.003 -0.011
factor(month2)8 -0.113 -0.006 -0.011  0.475

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max
-2.7792032 -0.5127146 -0.1031226  0.3935509  4.3119462

Number of Observations: 802
Number of Groups: 286

The estimation of Repeatability from this model would be
0.46/(0.46+0.47)=0.485
However if I extract the residuals from this model
res=resid(lme,type="normalized") and do acf(residuals) the plot shows a
high autocorrelation in the first 15 lags. So I tried the following:

MODEL 2: same model including an autocorrelation term

lme3=update(lme1,correlation=corAR1(form=~month2))

summary(lme)

Linear mixed-effects model fit by REML
 Data: hrm3
       AIC     BIC    logLik
  1461.013 1498.46 -722.5066

Random effects:
 Formula: ~1 | fish
              (Intercept)       Residual
StdDev: 0.0005272246 0.6819065

Correlation Structure: ARMA(1,0)
 Formula: ~month2 | fish
 Parameter estimate(s):
     Phi1
0.6097222
Fixed effects: loghr ~ area + tl + factor(month2)
                            Value         Std.Error      DF    t-value
  p-value
(Intercept)           -1.1584548 0.15740075 514  -7.359906  0.0000
areatvedestrand -1.0019165 0.06730508 283  -14.886194 0.0000
tl                         -0.0035497 0.00319449 283  -1.111192   0.2674
factor(month2)7 -0.2212453 0.03628563 514  -6.097327   0.0000
factor(month2)8 -0.1275377 0.04735955 514  -2.692968   0.0073
 Correlation:
                            (Intr) artvds tl     fc(2)7
areatvedestrand -0.227
tl                         -0.944  0.022
factor(month2)7 -0.104  0.003 -0.009
factor(month2)8 -0.124 -0.005 -0.013  0.611

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max
-2.62204450 -0.68663130 -0.08503995  0.58177543  3.79290975

Number of Observations: 802
Number of Groups: 286

If I plot acf(residual) now the plot looks much nicer (almost all the
autocorrelation has been corrected)

As you can see, the fact of including the autocorrelation term
dramatically
reduces the random variance from 0.46 to 0.0005, which in turn reduces
the
Repeatability from 0.485 to 5.98e-07.

NOW: if repeat this exercise working at a daily scale (90 replicates per
ID), the reduction in random variance and repeatability is much smaller
(from 0.46 to 0.16), but very noticeable still.

My questions are:
- How should I interpret this? Is the autocorrelation term in model 2
taking most of the random variance previously explained by the individual
ID in model 1?
- Model 2 yields a very low value of repeatability, however model
selection
is telling me than the random effect (individual ID) has to be included
in
the model. I interpret this as the repeatability being significantly
different from zero, even it is really low. Correct?
- With a different trait, I observe the same reduction after including a
"weights" argument. And with another trait, I observe a similar reduction
when I pass from an autocorrelation structure corAR1 to a correlation
structure corARMA (2,2). Any thought?
- Could I say that including explanatory variables (fixed part of the
model) will reduce the residual variance, and including terms in the
random
part of the model (random factors, autocorrelations terms, weights, ...)
will reduce the random variance?

Many thanks in advance,

David

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models