Skip to content

less than full rank contrast methods

6 messages · Max Kuhn, Greg Snow, Frank E Harrell Jr +2 more

#
I'd like to make a less than full rank design using dummy variables
for factors. Here is some example data:

when <- data.frame(time = c("afternoon", "night", "afternoon",
                            "morning", "morning", "morning",
                            "morning", "afternoon", "afternoon"),
                   day = c("Monday", "Monday", "Monday",
                           "Wednesday", "Wednesday", "Friday",
                           "Saturday", "Saturday", "Friday"))

For a single factor, I can do this this using
timeafternoon timemorning timenight
1             1           0         0
2             0           0         1
3             1           0         0
4             0           1         0
5             0           1         0
6             0           1         0

but this breakdown muti-variable formulas such as "time + day" or
"time + dat + time:day".

I've looked for alternate contrast functions to do this and I haven't
figured out a way to coerce existing functions to get the desired
output. Hopefully I haven't missed anything obvious.

Thanks,

Max
R version 2.11.1 Patched (2010-09-11 r52910)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
#
Have you tried setting singular.ok=TRUE in the call to lm?  This will start with the full set of contrasts, but only fit those that it is able to.

Otherwise you can set specific contrasts or subsets using the C (note case) or contrasts functions.
#
Given a non-singular fit, the contrast function in the rms package will allow
you to request multi-dimensional contrasts some of which are redundant. 
These singular contrasts are automatically ignored.  One use for this is to
test for differences in longitudinal trends between two of three treatment
groups, where the time trend is a multiple degree of freedom
parameterization such as cubic splines.  You don't have to stop and think
about how many time points to test; just test as many as you'd like and get
the right degrees of freedom according to the number of spline terms (main
effects + interactions).

Frank

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
#
Greg and Frank,

Thanks for the replies. I didn't express myself very well; I'm not interest in the model fitting aspect. I'd just like to get the full set of dummy variables (optimally from model.matrix)

Max
On Dec 6, 2010, at 10:29 PM, Frank Harrell <f.harrell at vanderbilt.edu> wrote:

            
#
'On Tue, Dec 7, 2010 at 5:19 AM, mxkuhn <mxkuhn at gmail.com> wrote:
Try this:

levels(when$time) <- c("morning", "afternoon", "night")
levels(when$day) <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")

contr.dummy <- function(n, ...) diag(n)
mm <- suppressWarnings(model.matrix(~ day + time, when))
mm[is.na(mm)] <- 1
mm

You might also want to set the levels of your factors first so that it
includes levels that are not in the data and so that the levels are
sorted in an order other than alphabetical:


etc.
#
On Tue, Dec 7, 2010 at 7:54 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
The levels(when$day) lines should be:

levels(when$day) <- c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday")