Skip to content
Prev 5862 / 20628 Next

No data for 1 interaction combination: problem in R not in Genstat

On Tue, Apr 12, 2011 at 8:47 AM, Ben Bolker <bbolker at gmail.com> wrote:
The way that lm and glm handle the missing cells in a model with
interactions is through pivoting on columns of the model matrix, but
in a way that numerical analysts didn't anticipate and hence do not
provide for in standard numerical linear algebra software.  Most
linear algebra operations in R are done using Lapack code except for
the most important one - solving least squares problems - which uses
modified Linpack code.  (This, by the way, is why accelerated BLAS
don't improve calculation speed for lm, glm, etc.  The Linpack code
only uses level-1 BLAS and there is not much you can do with clever
programming to make level-1 BLAS run faster.)

Most of the time rank deficiency in least squares problems is avoided
by using contrasts for terms involving factors.  However, missing
cells in a model with an interaction term are one case where rank
deficiency cannot (easily) be caught during the symbolic analysis.
The methods used in lm perform a QR decomposition with the "pivot on
apparent singularity" scheme that produces a set of columnst that are
full rank.  The methods used in lmer don't include a check for rank
deficiency.  Martin and I discussed the possibility of doing that
immediately after the model matrix for the fixed effects was generated
and performing the pivoting and rank reduction at that time but that
modification remains on the To Do list.

The fact that lm inserts NA for the estimates of the coefficients
doesn't really relate to the computational method.  That is done, by
reversing the column permutation, after the other parameter estimates
have been calculated.