Dear R-devel list members,
I'd like to suggest a more flexible procedure for generating contrast
names. I apologise for a relatively long message -- I want my proposal to
be clear.
I've never liked the current approach. For example, the names generated by
contr.treatment paste factor to level names with no separation between the
two; contr.sum simply numbers contrasts (I recall an exchange on the list
about this question); none of contr.* explicitly indicates what kind of
contrasts are generated. There are ways around these problems -- e.g.,
beginning level names with an uppercase character -- but in my experience
it's common to see things like "regionmidwest".
I propose that the current behaviour remain the default, but that the
contr.* functions use optional "prefix" and "suffix" characters around
level names (or their equivalent) and an optional identifier string to
indicate what kind of contrasts are generated. As well, I propose that
contr.sum optionally identify contrasts by level names. All of these
behaviours could be controlled by options which, if unset, would produce
the current behaviour. There might be problems with my implementation of
these ideas, or with the ideas themselves -- but I expect that these will
arise in discussion. (Of course, there's nothing to prevent me from using
the functions that I show below, but I thought that these questions might
be of more general interest.)
Here, for example, are rewrites of contr.treatment and contr.sum (renamed
contr.Treatment and contr.Sum because of name-space issues) that implement
my suggestions, together with illustrations of their use:
------------------------------------------------------------------------------------------------------------------------------
contr.Treatment <- function (n, base = 1, contrasts = TRUE) {
if (is.numeric(n) && length(n) == 1)
levs <- 1:n
else {
levs <- n
n <- length(n)
}
lev.opt <- getOption("decorate.factor.levels")
pre <- if (is.null(lev.opt)) "" else lev.opt[1]
suf <- if (is.null(lev.opt)) "" else lev.opt[2]
dec <- getOption("decorate.contr.Treatment")
dec <- if (is.null(dec)) "" else dec
contr.names <- paste(pre, dec, levs, suf, sep="")
contr <- array(0, c(n, n), list(levs, contr.names))
diag(contr) <- 1
if (contrasts) {
if (n < 2)
stop(paste("Contrasts not defined for", n - 1, "degrees of
freedom"))
if (base < 1 | base > n)
stop("Baseline group number out of range")
contr <- contr[, -base, drop = FALSE]
}
contr
}
contr.Sum <- function (n, contrasts = TRUE)
{
if (length(n) <= 1) {
if (is.numeric(n) && length(n) == 1 && n > 1)
levels <- 1:n
else stop("Not enough degrees of freedom to define contrasts")
}
else levels <- n
lenglev <- length(levels)
lev.opt <- getOption("decorate.factor.levels")
pre <- if (is.null(lev.opt)) "" else lev.opt[1]
suf <- if (is.null(lev.opt)) "" else lev.opt[2]
dec <- getOption("decorate.contr.Sum")
dec <- if (is.null(dec)) "" else dec
show.lev <- getOption("contr.Sum.show.levels")
contr.names <- if ((!is.null(show.lev)) && show.lev) paste(pre,
dec, levels, suf, sep="")
if (contrasts) {
cont <- array(0, c(lenglev, lenglev - 1), list(levels,
contr.names[-lenglev]))
cont[col(cont) == row(cont)] <- 1
cont[lenglev, ] <- -1
}
else {
cont <- array(0, c(lenglev, lenglev), list(levels, levels))
cont[col(cont) == row(cont)] <- 1
}
cont
}
> library(car)
. . .
> data(Prestige)
> attach(Prestige)
> contrasts(type) <- "contr.Treatment"
>
> lm(prestige ~ (income + education)*type) # default behaviour
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
typeprof typewc
2.275753 0.003522 1.713275
15.351896 -33.536652
income:typeprof income:typewc education:typeprof
education:typewc
-0.002903 -0.002072 1.387809
4.290875
>
> options(decorate.factor.levels=c("[", "]")) # using brackets
> lm(prestige ~ (income + education)*type)
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
type[prof] type[wc]
2.275753 0.003522 1.713275
15.351896 -33.536652
income:type[prof] income:type[wc] education:type[prof]
education:type[wc]
-0.002903 -0.002072 1.387809
4.290875
>
> options(decorate.contr.Treatment="T.") # naming contrast type
> lm(prestige ~ (income + education)*type)
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
type[T.prof]
2.275753 0.003522 1.713275
15.351896
type[T.wc] income:type[T.prof] income:type[T.wc]
education:type[T.prof]
-33.536652 -0.002903 -0.002072
1.387809
education:type[T.wc]
4.290875
>
> options(decorate.contr.Treatment=NULL)
> options(decorate.factor.levels=c("#", "")) # alternate style, using hash
> lm(prestige ~ (income + education)*type)
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
type#prof type#wc
2.275753 0.003522 1.713275
15.351896 -33.536652
income:type#prof income:type#wc education:type#prof
education:type#wc
-0.002903 -0.002072 1.387809
4.290875
>
> lm(prestige ~ (income + education)*type,
contrasts=list(type="contr.Sum")) # default behaviour
Call:
lm(formula = prestige ~ (income + education) * type, contrasts =
list(type = "contr.Sum"))
Coefficients:
(Intercept) income education type1
type2 income:type1 income:type2
-3.785832 0.001864 3.606169 6.061585
21.413481 0.001658 -0.001244
education:type1 education:type2
-1.892895 -0.505086
>
> options(contr.Sum.show.levels=TRUE)
> options(decorate.factor.levels=c("[", "]"))
> options(decorate.contr.Sum="S.") # showing levels, brackets,
contrast type
> lm(prestige ~ (income + education)*type,
contrasts=list(type="contr.Sum"))
Call:
lm(formula = prestige ~ (income + education) * type, contrasts =
list(type = "contr.Sum"))
Coefficients:
(Intercept) income education
type[S.bc]
-3.785832 0.001864 3.606169
6.061585
type[S.prof] income:type[S.bc] income:type[S.prof]
education:type[S.bc]
21.413481 0.001658 -0.001244
-1.892895
education:type[S.prof]
-0.505086
------------------------------------------------------------------------------------------------------------------------------
I look forward to people's comments.
John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox@mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------
generating contrast names
2 messages · John Fox, Torsten Hothorn
Dear R-devel list members, I'd like to suggest a more flexible procedure for generating contrast names. I apologise for a relatively long message -- I want my proposal to be clear. I've never liked the current approach. For example, the names generated by contr.treatment paste factor to level names with no separation between the two; contr.sum simply numbers contrasts (I recall an exchange on the list about this question); none of contr.* explicitly indicates what kind of contrasts are generated. There are ways around these problems -- e.g., beginning level names with an uppercase character -- but in my experience it's common to see things like "regionmidwest". I propose that the current behaviour remain the default, but that the contr.* functions use optional "prefix" and "suffix" characters around level names (or their equivalent) and an optional identifier string to indicate what kind of contrasts are generated. As well, I propose that contr.sum optionally identify contrasts by level names. All of these behaviours could be controlled by options which, if unset, would produce the current behaviour. There might be problems with my implementation of these ideas, or with the ideas themselves -- but I expect that these will arise in discussion. (Of course, there's nothing to prevent me from using the functions that I show below, but I thought that these questions might be of more general interest.)
very interesting suggestion. We had the same problem within the multcomp package and we decided to paste the factor names, levels and kind of contrasts used into one single name where possible, for example Dunnett:
ci <- simint(minutes ~ blanket, data=recovery, conf.level=0.9,
+ alternative="less",eps=0.0001)
summary(ci)
Simultaneous 90% confidence intervals: Dunnett contrasts
Call:
simint.formula(formula = minutes ~ blanket, data = recovery,
conf.level = 0.9, alternative = "less", eps = 1e-04)
Dunnett contrasts for factor blanket
Contrast matrix:
blanketb0 blanketb1 blanketb2 blanketb3
blanketb1-blanketb0 0 -1 1 0 0
blanketb2-blanketb0 0 -1 0 1 0
blanketb3-blanketb0 0 -1 0 0 1
where the factor is `blanket' at levels b0,b1,b2,b3. Of course this is
restricted to simple contrasts (like the difference here) but breaks for
more complicated situations (tetrade constrasts for example).
best,
Torsten
Here, for example, are rewrites of contr.treatment and contr.sum (renamed
contr.Treatment and contr.Sum because of name-space issues) that implement
my suggestions, together with illustrations of their use:
------------------------------------------------------------------------------------------------------------------------------
contr.Treatment <- function (n, base = 1, contrasts = TRUE) {
if (is.numeric(n) && length(n) == 1)
levs <- 1:n
else {
levs <- n
n <- length(n)
}
lev.opt <- getOption("decorate.factor.levels")
pre <- if (is.null(lev.opt)) "" else lev.opt[1]
suf <- if (is.null(lev.opt)) "" else lev.opt[2]
dec <- getOption("decorate.contr.Treatment")
dec <- if (is.null(dec)) "" else dec
contr.names <- paste(pre, dec, levs, suf, sep="")
contr <- array(0, c(n, n), list(levs, contr.names))
diag(contr) <- 1
if (contrasts) {
if (n < 2)
stop(paste("Contrasts not defined for", n - 1, "degrees of
freedom"))
if (base < 1 | base > n)
stop("Baseline group number out of range")
contr <- contr[, -base, drop = FALSE]
}
contr
}
contr.Sum <- function (n, contrasts = TRUE)
{
if (length(n) <= 1) {
if (is.numeric(n) && length(n) == 1 && n > 1)
levels <- 1:n
else stop("Not enough degrees of freedom to define contrasts")
}
else levels <- n
lenglev <- length(levels)
lev.opt <- getOption("decorate.factor.levels")
pre <- if (is.null(lev.opt)) "" else lev.opt[1]
suf <- if (is.null(lev.opt)) "" else lev.opt[2]
dec <- getOption("decorate.contr.Sum")
dec <- if (is.null(dec)) "" else dec
show.lev <- getOption("contr.Sum.show.levels")
contr.names <- if ((!is.null(show.lev)) && show.lev) paste(pre,
dec, levels, suf, sep="")
if (contrasts) {
cont <- array(0, c(lenglev, lenglev - 1), list(levels,
contr.names[-lenglev]))
cont[col(cont) == row(cont)] <- 1
cont[lenglev, ] <- -1
}
else {
cont <- array(0, c(lenglev, lenglev), list(levels, levels))
cont[col(cont) == row(cont)] <- 1
}
cont
}
> library(car)
. . .
> data(Prestige)
> attach(Prestige)
> contrasts(type) <- "contr.Treatment"
>
> lm(prestige ~ (income + education)*type) # default behaviour
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
typeprof typewc
2.275753 0.003522 1.713275
15.351896 -33.536652
income:typeprof income:typewc education:typeprof
education:typewc
-0.002903 -0.002072 1.387809
4.290875
>
> options(decorate.factor.levels=c("[", "]")) # using brackets
> lm(prestige ~ (income + education)*type)
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
type[prof] type[wc]
2.275753 0.003522 1.713275
15.351896 -33.536652
income:type[prof] income:type[wc] education:type[prof]
education:type[wc]
-0.002903 -0.002072 1.387809
4.290875
>
> options(decorate.contr.Treatment="T.") # naming contrast type
> lm(prestige ~ (income + education)*type)
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
type[T.prof]
2.275753 0.003522 1.713275
15.351896
type[T.wc] income:type[T.prof] income:type[T.wc]
education:type[T.prof]
-33.536652 -0.002903 -0.002072
1.387809
education:type[T.wc]
4.290875
>
> options(decorate.contr.Treatment=NULL)
> options(decorate.factor.levels=c("#", "")) # alternate style, using hash
> lm(prestige ~ (income + education)*type)
Call:
lm(formula = prestige ~ (income + education) * type)
Coefficients:
(Intercept) income education
type#prof type#wc
2.275753 0.003522 1.713275
15.351896 -33.536652
income:type#prof income:type#wc education:type#prof
education:type#wc
-0.002903 -0.002072 1.387809
4.290875
>
> lm(prestige ~ (income + education)*type,
contrasts=list(type="contr.Sum")) # default behaviour
Call:
lm(formula = prestige ~ (income + education) * type, contrasts =
list(type = "contr.Sum"))
Coefficients:
(Intercept) income education type1
type2 income:type1 income:type2
-3.785832 0.001864 3.606169 6.061585
21.413481 0.001658 -0.001244
education:type1 education:type2
-1.892895 -0.505086
>
> options(contr.Sum.show.levels=TRUE)
> options(decorate.factor.levels=c("[", "]"))
> options(decorate.contr.Sum="S.") # showing levels, brackets,
contrast type
> lm(prestige ~ (income + education)*type,
contrasts=list(type="contr.Sum"))
Call:
lm(formula = prestige ~ (income + education) * type, contrasts =
list(type = "contr.Sum"))
Coefficients:
(Intercept) income education
type[S.bc]
-3.785832 0.001864 3.606169
6.061585
type[S.prof] income:type[S.bc] income:type[S.prof]
education:type[S.bc]
21.413481 0.001658 -0.001244
-1.892895
education:type[S.prof]
-0.505086
------------------------------------------------------------------------------------------------------------------------------
I look forward to people's comments.
John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox@mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------
______________________________________________ R-devel@stat.math.ethz.ch mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-devel