Skip to content
Prev 54311 / 63421 Next

Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

Hello Tyler,

Thank you for searching for, and finding, the basic description of the
behavior of R in this matter.

I think your example is in agreement with the book.

But let me first note the following. You write: "F_j refers to a
factor (variable) in a model and not a categorical factor". However:
"a factor is a vector object used to specify a discrete
classification" (start of chapter 4 of "An Introduction to R".) You
might also see the description of the R function factor().

You note that the book says about a factor F_j:
  "... F_j is coded by contrasts if T_{i(j)} has appeared in the
formula and by dummy variables if it has not"

You find:
   "However, the example I gave demonstrated that this dummy variable
encoding only occurs for the model where the missing term is the
numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2."

We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then
T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i
must be encoded by dummy variables, as indeed it is.

  Arie
On Tue, Oct 31, 2017 at 4:01 PM, Tyler <tylermw at gmail.com> wrote: