matching columns of model matrix to those in original data.frame

Fri, Jul 26, 2013 7:23 PM

What is a reliable way to go from a column of a model matrix back to the column (or columns) of the original data source used to make the model 
matrix?  I can come up with a method that seems to work, but I don't see guarantees in the documentation that it will.

In particular, does the order of the term.labels match the order of columns for factors in a terms object?  The documentation says the model.matrix 
assign attribute uses the ordering of terms.labels.

If anyone can tell me if this approach is reliable, or of one that is, I would appreciate it.

Ross Boylan

Proposed function and a little example follow.

# return a vector v such that data[,v[i]] contributed to mm[,i]
# mm = model matrix produced by
# form = formula
# data = data
reverse.map <- function(mm, form, data){
    tt <- terms(form, data=data)
    ttf <- attr(tt, "factors")
    mmi <- attr(mm, "assign")
    # this depends on assign using same order as columns of factors
    # entries in mmi that are 0 (the intercept) are silently dropped
    ttf2 <- ttf[,mmi]
    # take the first row that contributes
    r <- apply(ttf2, 2, function(is) rownames(ttf)[is > 0][1])
    match(r, colnames(data))
}

pEthnic ethnic_sg rac_gay
1366 Afr Amer  Afr Amer    3.25
3052 Afr Amer  Afr Amer    1.75
3012   Latino  Afr Amer    2.00
369  Afr Amer  Asian/PI    2.00
529     White  Asian/PI    2.00
194  Asian/PI  Asian/PI    3.25
126     White  Asian/PI    2.25
2147   Latino    Latino    2.75

[1] "(Intercept)"               "pEthnicAsian/PI"          
 [3] "pEthnicLatino"             "pEthnicOther"             
 [5] "pEthnicWhite"              "ethnic_sgAsian/PI"        
 [7] "ethnic_sgLatino"           "rac_gay"                  
 [9] "ethnic_sgAsian/PI:rac_gay" "ethnic_sgLatino:rac_gay"

pEthnic ethnic_sg rac_gay ethnic_sg:rac_gay
pEthnic         1         0       0                 0
ethnic_sg       0         1       0                 1
rac_gay         0         0       1                 1

[1] 0 1 1 1 1 2 2 3 4 4

[1] 1 1 1 1 2 2 3 2 2