Skip to content

why NA coefficients

4 messages · Dennis Murphy, array chip, William Dunlap

#
The cell mean mu_{12} is non-estimable because it has no data in the
cell. How can you estimate something that's not there (at least
without imputation :)? Every parametric function that involves mu_{12}
will also be non-estimable - in particular,  the interaction term and
the population marginal means . That's why you get the NA estimates
and the warning. All this follows from the linear model theory
described in, for example, Milliken and Johnson (1992), Analysis of
Messy Data, vol. 1, ch. 13.

Here's an example from Milliken and Johnson (1992) to illustrate:
          B1         B2       B3
T1      2, 6                   8, 6
T2        3          14      12, 9
T3        6           9

Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means
are estimated by the cell averages.
"Whenever treatment combinations are missing, certain
hypotheses cannot be tested without making some
additional assumptions about the parameters in the model.
Hypotheses involving parameters corresponding to the
missing cells generally cannot be tested. For example,
for the data [above] it is not possible to estimate any
linear combinations (or to test any hypotheses) that
involve parameters \mu_{12} and \mu_{33} unless one
is willing to make some assumptions about them."

They continue:
"One common assumption is that there is no
interactions between the levels of T and the levels of B.
In our opinion, this assumption should not be made
without some supporting experimental evidence."

In other words, removing the interaction term makes the
non-estimability problem disappear, but it's a copout unless there is
some tangible scientific justification for an additive rather than an
interaction model.

For the above data, M & J note that it is not possible to estimate all
of the expected marginal means - in particular, one cannot estimate
the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$,
$\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and
$\bar{\mu}_{.1}$ since these functions of the parameters involve terms
associated with the means of the missing cells. Moreover, any
hypotheses involving parametric functions that contain non-estimable
cell means are not testable. In this example, the test of equal row
population marginal means is not testable because $\bar{\mu}_{1.}$ and
$\bar{\mu}_{3.}$ are not estimable.

[Aside: if the term parametric function is not familiar, in this
context it refers to linear combinations of model parameters.  In the
M & J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a
parametric function.]

Hopefully this sheds some light on the situation.

Dennis
On Mon, Nov 7, 2011 at 10:17 PM, array chip <arrayprofile at yahoo.com> wrote:
#
It might make the discussion easier to follow if you used
a smaller dataset that anyone can make and did some experiments
with contrasts. E.g.,
X1 X2   Y
2  B  x   2
3  C  x   4
4  A  y   8
5  B  y  16
6  C  y  32
7  A  z  64
8  B  z 128
9  C  z 256
Call:
lm(formula = Y ~ X1 * X2, data = D)

Coefficients:
(Intercept)          X1B          X1C  
       -188          190          192  
        X2y          X2z      X1B:X2y  
        196          252         -182  
    X1C:X2y      X1B:X2z      X1C:X2z  
       -168         -126           NA
Call:
lm(formula = Y ~ X1 * X2, data = D, contrasts = list(X2 = "contr.SAS"))

Coefficients:
(Intercept)          X1B          X1C  
         64           64          192  
        X2x          X2y      X1B:X2x  
       -252          -56          126  
    X1C:X2x      X1B:X2y      X1C:X2y  
         NA          -56         -168  


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com