While I was out of contact with the list for a couple of days, I put the
following together. It is for the data that Mike gave initially, so
that
it assumes no replicates within POLE. The mean square for
MONTH:TIME:TRANSECT:POLE is the Residual.
The variance component for TRANSECT is effectively zero. For
purposes of exposition, it can be set to zero.
The terms formula MONTH/TIME then suffices. lme() gives the numbers of
groups as:
Number of Observations: 286
Number of Groups:
MONTH TIME %in% MONTH
4 16
Observe that the design is then very nearly a complete balanced design:
> with(temp, table(MONTH,TIME))
TIME
MONTH 1 2 3 4
4 18 17 18 18
5 18 18 18 17
6 18 18 18 18
7 18 18 18 18
Observe that
286/16=17.875
less than 18 because two of the TIME:MONTH combinations have
only 17 values. The harmonic mean, which is 17.87, seems however
preferable.
If the design were balanced, with replicates at
MONTH/TIME/POLE level equal to n1=4/n2=4/n3=17.87
then the mean squares would estimate, respectively
4 x 17.87 s1^2 + 17.87 s2^2 + s3^2 = 4 x 17.87 (s1^2 + s2^2 /4 +
s3^2 / (4*17.87)
17.87 s2^2 + s3^2 = 17.87 ( s2^2 + s3^2 / 17.87)
s3^2
In the first two cases, a second version of the formula is given
that makes (to me, at least) better intuitive sense.
We have
Variance Anova DF No. of repeats
Component mean square per 'parent'
MONTH 0.75287 74.10751 3 4
TIME 1.07395 20.52433 12 4
Residual 1.20354 1.20356 270 17.87 (harmonic mean)
Observe that the residual mean square estimates the residual variance.
The TIME variance component is calculated pretty much as
(20.52433 - 1.20356)/17.87 = 1.0813, which does not quite agree
with the ML estimate (nor should it; equating mean squares to expected
mean squares is not the same as REML!)).
The statistical error is affected both by the statistical error in
the TIME
anova mean square, and by the statistical error in the Residual mean
square. The variance formula that is given below for MONTH can be
readily adapted for this case also.
The MONTH variance component is calculated pretty much as: (74.10751 -
20.52433)/(4*17.87) = 0.7497 Again the agreement with the REML
estimate is, quite properly, not perfect.
The estimate for s1^2 (MONTH} has a statistical error that is a
compound of the errors in the ANOVA mean squares for both TIME and
MONTH. The variances of the two anova mean squares (both sample
values of chi-squared statistics, under the usual assumptions) add,
while the quantities themselves are subtracted. The SE (sqrt of
variance), for the estimate s1^2 of sig1^2, is
sqrt( (n2 * n3 * sig1^2 + n3 * sig2^2 + sig3^2)^2 / nu1
+ (n3 * sig2^2 + sig3^2)^2 / nu2 ) / (n2*n3)
[The n's are the numbers of repeats. The nu's are degrees of freedom.]
Variances are not, for quantities that are differences multiples of
chi-squared statistics, a good basis for inference. (Here I am tempted
to make rude comments about over-reliance on variances in much
sample survey work!). The variance calculations may however be useful
in giving an idea of the relative contributions of the different
sources of
statistical noise.
I'd expect, though I have not gone through the arithmetic in detail,
that the estimate for SE[s1^2] will increase, if we allow for the
possibility that the TRANSECT component of variance, even though
estimated as zero, may actually be greater than zero.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
All,
I don't have the initial e-mails from this thread and when I searched the
mail archives for "explaining lme variance component results" it doesn't
come up. Do the SIGs need to be searched differently then r-help?
Thanks,
David
On 9/6/07 5:07 AM, "John Maindonald" <john.maindonald at anu.edu.au> wrote:
While I was out of contact with the list for a couple of days, I put the
following together. It is for the data that Mike gave initially, so
that
it assumes no replicates within POLE. The mean square for
MONTH:TIME:TRANSECT:POLE is the Residual.
The variance component for TRANSECT is effectively zero. For
purposes of exposition, it can be set to zero.
The terms formula MONTH/TIME then suffices. lme() gives the numbers of
groups as:
Number of Observations: 286
Number of Groups:
MONTH TIME %in% MONTH
4 16
Observe that the design is then very nearly a complete balanced design:
with(temp, table(MONTH,TIME))
TIME
MONTH 1 2 3 4
4 18 17 18 18
5 18 18 18 17
6 18 18 18 18
7 18 18 18 18
Observe that
286/16=17.875
less than 18 because two of the TIME:MONTH combinations have
only 17 values. The harmonic mean, which is 17.87, seems however
preferable.
If the design were balanced, with replicates at
MONTH/TIME/POLE level equal to n1=4/n2=4/n3=17.87
then the mean squares would estimate, respectively
4 x 17.87 s1^2 + 17.87 s2^2 + s3^2 = 4 x 17.87 (s1^2 + s2^2 /4 +
s3^2 / (4*17.87)
17.87 s2^2 + s3^2 = 17.87 ( s2^2 + s3^2 / 17.87)
s3^2
In the first two cases, a second version of the formula is given
that makes (to me, at least) better intuitive sense.
We have
Variance Anova DF No. of repeats
Component mean square per 'parent'
MONTH 0.75287 74.10751 3 4
TIME 1.07395 20.52433 12 4
Residual 1.20354 1.20356 270 17.87 (harmonic mean)
Observe that the residual mean square estimates the residual variance.
The TIME variance component is calculated pretty much as
(20.52433 - 1.20356)/17.87 = 1.0813, which does not quite agree
with the ML estimate (nor should it; equating mean squares to expected
mean squares is not the same as REML!)).
The statistical error is affected both by the statistical error in
the TIME
anova mean square, and by the statistical error in the Residual mean
square. The variance formula that is given below for MONTH can be
readily adapted for this case also.
The MONTH variance component is calculated pretty much as: (74.10751 -
20.52433)/(4*17.87) = 0.7497 Again the agreement with the REML
estimate is, quite properly, not perfect.
The estimate for s1^2 (MONTH} has a statistical error that is a
compound of the errors in the ANOVA mean squares for both TIME and
MONTH. The variances of the two anova mean squares (both sample
values of chi-squared statistics, under the usual assumptions) add,
while the quantities themselves are subtracted. The SE (sqrt of
variance), for the estimate s1^2 of sig1^2, is
sqrt( (n2 * n3 * sig1^2 + n3 * sig2^2 + sig3^2)^2 / nu1
+ (n3 * sig2^2 + sig3^2)^2 / nu2 ) / (n2*n3)
[The n's are the numbers of repeats. The nu's are degrees of freedom.]
Variances are not, for quantities that are differences multiples of
chi-squared statistics, a good basis for inference. (Here I am tempted
to make rude comments about over-reliance on variances in much
sample survey work!). The variance calculations may however be useful
in giving an idea of the relative contributions of the different
sources of
statistical noise.
I'd expect, though I have not gone through the arithmetic in detail,
that the estimate for SE[s1^2] will increase, if we allow for the
possibility that the TRANSECT component of variance, even though
estimated as zero, may actually be greater than zero.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 5 Sep 2007, at 4:54 AM, Mike Dunbar wrote: