Skip to content

Comparison of age categories using contrasts

10 messages · Mark Difford, Frank E Harrell Jr, Greg Snow +3 more

#
Dear listers,

I would like to compare the levels of a factor with 8 age categories 
(0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (70,90] (however, 
the factor has not been ordered yet). The default in glm is 
cont.treatment (for unordered factors) and that leads to compare each 
level to the first one. I would rather prefer to compare the 2nd to the 
1st, the 3rd to the 2nd, the 4th to the 3rd, etc... My understanding is 
that cont.poly may make the trick, eg specified like this:

mod3<-glm(AE~agecat, family=binomial,data=qinghai2, 
contrasts=list(agecat="contr.poly"))

but I am not sure to be right.

Would be grateful if a true statistician can confirm or fire me... and 
before definitive fire tell me how to manage with this...
#
Hi Patrick,
The functions ?C and ?contrasts allow you to set up your own contrast
matrix. A function to carry out the type of comparisons you are interested
in has been written by Venables & Ripley. See contr.sdif in package MASS.

Regards, Mark.
Patrick Giraudoux wrote:

  
    
#
Mark Difford wrote:
My preference is to not reformulate the model to get desired contrasts 
but to think of the model as something that uses a convenient coding, 
then do after-fitting contrasts.  In the Design package you can fit a 
model (say, using glmD) then use contrast(fit, list(predictor settings 
1), list(predictor settings 2)) to get what you want.  ?contrast.Design 
provides the details.  I'll bet that one of John Fox's packages also 
provides a good way to do this.

Frank
#
One approach is to create your own contrasts matrix:
Now when you use agefactor, the intercept will be the first age group and the slopes will be the differences between the pairs of groups (make sure that the order of the levels of agefactor is correct).

The difference between this method and the contr.sdif function in MASS is how the intercept will end up being interpreted (and the dimnames).

Hope this helps,
#
On Mon, Feb 16, 2009 at 5:28 PM, Patrick Giraudoux
<patrick.giraudoux at univ-fcomte.fr> wrote:
Glad you were able to solve your problem. Frank replied earlier with
the suggestion to use contrast.Design() to perform the tests after the
initial model has been fit. Perhaps I am a little denser than normal,
but I cannot figure out how to apply contrast.Design() to a similar
model with several levels of a factor. Example data:

# need these
library(lattice)
library(Design)

# replicate an important experimental dataset
set.seed(10101010)
x <- rnorm(100)
y1 <- x[1:25] * 2 + rnorm(25, mean=1)
y2 <- x[26:50] * 2.6 + rnorm(25, mean=1.5)
y3 <- x[51:75] * 2.9 + rnorm(25, mean=5)
y4 <- x[76:100] * 3.5 + rnorm(25, mean=5.5)

d <- data.frame(x=x, y=c(y1,y2,y3,y4), f=factor(rep(letters[1:4], each=25)))

# plot
xyplot(y ~ x, groups=f, data=d, auto.key=list(columns=4, title='Beard
Type', lines=TRUE, points=FALSE, cex=0.75), type=c('p','r'),
ylab='Number of Pirates', xlab='Distance from Land')

# standard comparison to base level of f
summary(lm(y ~ x * f, data=d))

# compare to level 4 of f
summary(lm(y ~ x * C(f, base=4), data=d))


# now try with contrast.Design():
dd <- datadist(d)
options(datadist='dd')
l <- ols(y ~ x * f, data=d)

# probably the wrong approach, as the results do not look too familiar:
contrast(l, list(f=levels(d$f)))

 x          f Contrast  S.E.      Lower       Upper     t     Pr(>|t|)
 -0.3449623 a 0.3966682 0.1961059 0.007184856 0.7861515  2.02 0.0460
 -0.3449623 b 0.5587395 0.1889173 0.183533422 0.9339456  2.96 0.0039
 -0.3449623 c 4.1511677 0.1889170 3.775962254 4.5263730 21.97 0.0000
 -0.3449623 d 4.3510407 0.1888820 3.975904726 4.7261766 23.04 0.0000


This is adjusting the output to a given value of 'x'... Am I trying to
use contrast.Design() for something that it was not intended for? Any
tips Frank?

Cheers,
Dylan
#
Hi Dylan,
I think Prof. Harrell's main point had to do with how interactions are
handled. You can also get the kind of contrasts that Patrick was interested
in via multcomp. If we do this using your artificial data set we see that
the contrasts are the same as those got by fitting the model using
contr.sdif, but a warning is generated about interactions in the model &c.
[see example code]

Part of Prof. Harrell's "system" is that in generating contrasts via
contr.Design an appropriate value is automatically chosen for the
interacting variable (in this case the median value of x) so that a
reasonable default set of contrasts is calculated. This can of course be
changed.

Coming to your question [?] about how to generate the kind of contrasts that
Patrick wanted using contrast.Design. Well, it is not that straightforward,
though I may have missed something in the documentation to the function. In
the past I have cobbled them together using the kind of hack given below:

## Exampe code
x <- rnorm(100) 
y1 <- x[1:25] * 2 + rnorm(25, mean=1)
y2 <- x[26:50] * 2.6 + rnorm(25, mean=1.5)
y3 <- x[51:75] * 2.9 + rnorm(25, mean=5)
y4 <- x[76:100] * 3.5 + rnorm(25, mean=5.5)


d <- data.frame(x=x, y=c(y1,y2,y3,y4), f=factor(rep(letters[1:4], each=25))) 


# now try with contrast.Design(): 
library(multcomp)
dd <- datadist(d) 
options(datadist='dd') 
l <- ols(y ~ x * f, data=d)

## model fitted using successive difference contrasts
library(MASS)
T.lm <- lm(y ~ x * f, contrasts=list(f="contr.sdif"), data=d)
summary(T.lm)

## show the warning: model fitted using ols() and treatment contrasts
summary(glht(l, linfct=mcp(f="Seq")))

## "custom" successive difference contrasts using contrast.Design
TFin <- {}
for (i in 1:(length(levels(d$f))-1)) {
    tcont <- contrast(l, a=list(f=levels(d$f)[i]),
b=list(f=levels(d$f)[i+1]))
    TFin <- rbind(TFin, tcont)
    rownames(TFin)[i] <-  paste(levels(d$f)[i], levels(d$f)[i+1], sep="-")
}
TFin[,1:9]

Regards, Mark.
Dylan Beaudette-2 wrote:

  
    
#
On 2/16/2009 10:18 PM, Dylan Beaudette wrote:
Does this help?

# Compare with summary(lm(y ~ x * f, data=d))
contrast(l, a=list(f=levels(d$f)[2:4], x=0),
            b=list(f=levels(d$f)[1],   x=0))

 f Contrast  S.E.      Lower      Upper     t     Pr(>|t|)
 b 0.3673455 0.2724247 -0.1737135 0.9084046  1.35 0.1808
 c 4.1310015 0.2714011  3.5919754 4.6700275 15.22 0.0000
 d 4.4308653 0.2731223  3.8884207 4.9733098 16.22 0.0000

Error d.f.= 92

# Compare with summary(lm(y ~ x * C(f, base=4), data=d))
contrast(l, a=list(f=levels(d$f)[1:3], x=0),
            b=list(f=levels(d$f)[4],   x=0))

 f Contrast   S.E.      Lower      Upper      t      Pr(>|t|)
 a -4.4308653 0.2731223 -4.9733098 -3.8884207 -16.22 0.0000
 b -4.0635198 0.2782704 -4.6161888 -3.5108507 -14.60 0.0000
 c -0.2998638 0.2772684 -0.8505427  0.2508151  -1.08 0.2823

Error d.f.= 92

  
    
#
Hi Dylan, Chuck,
There is a subtlety here that needs to be emphasized. Setting the
interacting variable (x) to zero is reasonable in this case, because the
mean value of rnorm(n) is zero. However, in the real world of real data it
is somewhat unusual for zero to be a reasonable value for an interacting
variable. In fact, it is this setting-to-zero that causes multcomp's "bells
to ring" with a word of warning.

##
glht(l, linfct=mcp(f="Seq"))$linfct

Regards, Mark.
Chuck Cleland wrote:

  
    
#
Hi Dylan, Chuck,
Mark Difford wrote:
Here is a much simpler route, and a correction to my earlier posting (helped
by a little off-list prompting from Prof. Harrell). All that is required to
get successive difference (i.e. sdif) contrasts using contrast.Design is the
following:


contrast(l, a=list(f=c("a","b","c")), b=list(f=c("b","c","d")))

## Run a full example
set.seed(10101010) 
x <- rnorm(100) 
y1 <- x[1:25] * 2 + rnorm(25, mean=1) 
y2 <- x[26:50] * 2.6 + rnorm(25, mean=1.5) 
y3 <- x[51:75] * 2.9 + rnorm(25, mean=5) 
y4 <- x[76:100] * 3.5 + rnorm(25, mean=5.5) 

d <- data.frame(x=x, y=c(y1,y2,y3,y4), f=factor(rep(letters[1:4], each=25)))

dd <- datadist(d) 
options(datadist='dd') 
l <- ols(y ~ x * f, data=d)

## compare with multcomp by setting x=0
summary(glht(l, linfct=mcp(f="Seq")))
contrast(l, b=list(x=0, f=c("a","b","c")), a=list(x=0, f=c("b","c","d")))

Regards, and apologies for any confusion, Mark.
Mark Difford wrote: