Skip to content
Prev 86576 / 398506 Next

A concrete type I/III Sum of square problem

Gregor Gorjanc <gregor.gorjanc at gmail.com> writes:
Hmm, the Danish tradition is highly based on lecture notes, so I don't
have a specific book for you. One possible starting point is 

Tue Tjur (1984): Analysis of variance designs in orthogonal designs.
Int.Statist.Review 52, 33-81.

The thing to notice in relation to that paper is that the
decomposition (p.55) of the covariance matrix as sum(lambda_B Q_B^0)
is highly dependent on having an orthogonal design. Without the
orthogonality, it still defines a model, but typically one without a
sensible interpretation.

Look at a simple 1-way anova with three groups of equal size. The Q
matrices will be the projections P_X and I-P_X, where X is the design
matrix for the grouping factor, e.g.
(Intercept) factor(rep(1:3, each = 2))2 factor(rep(1:3, each = 2))3
1           1                           0                           0
2           1                           0                           0
3           1                           1                           0
4           1                           1                           0
5           1                           0                           1
6           1                           0                           1
...

P_X can be found in the following semi-secret way:
1   2   3   4   5   6
1 0.5 0.5 0.0 0.0 0.0 0.0
2 0.5 0.5 0.0 0.0 0.0 0.0
3 0.0 0.0 0.5 0.5 0.0 0.0
4 0.0 0.0 0.5 0.5 0.0 0.0
5 0.0 0.0 0.0 0.0 0.5 0.5
6 0.0 0.0 0.0 0.0 0.5 0.5

Suppose we put a random component of 10 on P_X and 1 on (I-P_X).
We then get
1   2   3   4   5   6
1 5.5 4.5 0.0 0.0 0.0 0.0
2 4.5 5.5 0.0 0.0 0.0 0.0
3 0.0 0.0 5.5 4.5 0.0 0.0
4 0.0 0.0 4.5 5.5 0.0 0.0
5 0.0 0.0 0.0 0.0 5.5 4.5
6 0.0 0.0 0.0 0.0 4.5 5.5

which is a perfectly sensible covariance for within-group correlated
data. 

Now try the same stunt with unbalanced data:
1   2   3 4 5 6
1 10 0.0 0.0 0 0 0
2  0 5.5 4.5 0 0 0
3  0 4.5 5.5 0 0 0
4  0 0.0 0.0 4 3 3
5  0 0.0 0.0 3 4 3
6  0 0.0 0.0 3 3 4

I.e. we are de facto assuming that observations in the smaller group
have a larger variance than observations in the larger groups.