Anova and unbalanced designs

Tal,

The reparametirized cell means model can help with the understanding of what the individual terms mean in an analysis with contrasts.  The cell means model is y=W*mu + e, where mu is the vector of the cell means (the mean for each group) and W is just a stretched identity matrix, this model just fits the means of each cell without any comparisons/contrasts.  The reparameterized cell means model is y=W*Ai * A*mu + e, where Ai and A are inverses of each other and determine the set of contrasts (I tend to think of A as the contrast matrix and Ai as the dummy variable encoding matrix, but in some cases Ai is called the contrast matrix).  Basically this leads to x=W*Ai and beta= A*mu for the standard regression model of y=x*beta+e, so R is creating x for you and we just need to find A (the inverse of Ai) to see what beta really means.

We can start by creating some labels (assuming 5 groups for example) and loading the MASS package for some prettiness later:

library(MASS)

betas <- paste('b',0:4, sep='' )
mus <- paste('mu',1:5, sep='' )

Now, let's look at what Helmert contrasts give us:

Ai1 <- cbind(1, contr.helmert(5))
A1 <- solve(Ai1)

A1txt <- as.character(fractions(A1))

paste( betas, '=', apply(A1txt, 1, paste, mus, sep='*', collapse=' + '))

So beta0 (the intercept) is just the mean of the 5 groups, beta1 compares the first group to the second (actually half the first to half the second, this matters for interpreting the beta or confidence intervals, but not hypothesis tests).
Then beta2 compares the average of the first 2 groups to the 3rd group (with an extra 1/3rd in there, this makes the original Ai matrix and x matrix prettier).  Beta3 and beta4 compare groups 4 and 5 to the mean of the previous ones respectively.

Now look at summation contrasts (the contrasts sum to 0)

Ai2 <- cbind(1, contr.sum(5))
A2 <- solve(Ai2)
dimnames(A2) <- list(betas,mus)

fractions(A2)

The beta0 coefficient is still the overall mean and with a little algebra it can be seen that the other rows/betas measure the difference between the cell means (except the last) and the overall mean (just replace 4/5 with 1-1/5).

Now for the non-orthogonal treatment contrasts:

Ai3 <- cbind(1, contr.treatment(5))
A3 <- solve(Ai3)
dimnames(A3) <- list(betas, mus)
fractions(A3)

Now beta0 is not a mean of all the groups, but the mean of the first (reference) group.  The other betas are then the differences between the other groups and the reference group.

Polynomial contrasts are a bit more difficult to interpret:

Ai4 <- cbind(1, contr.poly(5))
A4 <- solve(Ai4)
dimnames(A4) <- list(betas, mus)
zapsmall(A4)
matplot(1:5, t(A4), type='b')

The graph is probably the easiest to interpret, the intercept is still the overall mean, beta1 shows a linear relationship, beta2 follows a quadratic, etc.  These are only meaningful if the groups are ordered and the same distance apart.

We can use the same idea in reverse to create our own contrasts, suppose we want to compare group 1 (control) to the mean of the rest, then compare groups 2 and 3, compare groups 4 and 5, then compare the mean of groups 2 and 3 to the mean of groups 4 and 5, we can do either of the following (depending on what we want beta0 to mean):

A6 <- rbind(1/5,
                c(-4,  1,  1,  1,  1)/4,
                c( 0, -1,  1,  0,  0),
                c( 0,  0,  0, -1,  1),
                c( 0, -1, -1,  1,  1)/2 )
fractions(A6)
zapsmall(solve(A6))

A7 <- rbind(c(1, 0,  0,  0,  0),
                c(-4,  1,  1,  1,  1)/4,
                c( 0, -1,  1,  0,  0),
                c( 0,  0,  0, -1,  1),
                c( 0, -1, -1,  1,  1)/2 )
fractions(A7)
zapsmall(solve(A7))

Just use the above Ai matricies to create x (use the C (note capital) or contrasts functions) and the individual terms will have the desired interpretations.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

Anova and unbalanced designs

Thread (5 messages)