Skip to content

Tukey HSD plot with lines indicating (non-)significance

8 messages · Karl Ove Hufthammer, Richard M. Heiberger

#
Dear list members,

I'm running some tests looking at differences between means for various 
levels of a factor, using Tukey's HSD method.

I would like to plot the data as boxplots or dotplots, with horizontal 
significance lines indicating which groups are statistically 
significantly different, according to Tukey HSD. Here's a nice image 
showing an example of such a graphical display:

   http://www.biomedcentral.com/1471-2148/11/99/figure/F2

Is there a R function for this? I have tried searching, and found 
several nice plotting fuctions for Tukey's HSD, but they all either hide 
the actual data, or display the results only with letters indicating 
(non-)signifiance.

Examples:

# Fit an one-way ANOVA.
library(DAAG)
l=aov(ShootDryMass ~ trt, data=rice)
summary(l)


# Calculate the Tukey HSD tests and CIs.
# The plot shows all 6*5/2 = 15 confidence intervals,
# but not the data or the actual group means.
l.hsd=TukeyHSD(l, "trt", ordered = TRUE)
l.hsd
plot(l.hsd)


# I rather like this one. It shows the group means in a nice way,
# but hides the data. And checking which groups are different
# isn't always easy (e.g., for NH4Cl - NH4NO3), as one
# has to compare differences to a non-aligned scale (bar).
onewayPlot(l)


# One function that gives a results closer to what I want,
# is the compact letter display (CLD) of the multcomp package.
# This shows boxplots (but it is easy to overlay the actual
# observations, using the points() function) and uses letters
# to indicate non-significance.
#
# (Note that (if I have understood everything correctly) this
# doesn't actually calculcate Tukey HSD values (the "Tukey"
# in the glht() call only selects Tukey *contrasts*), but
# the results should be *very* similar.)
library(multcomp)
l.glht=glht(l, linfct = mcp(trt = "Tukey"))
summary(l.glht)
l.cld=cld(l.glht)
old.par <- par( mai=c(1,1,2,1))
plot(l.cld)
par(old.par)


# Basically, I want a similar display as the one above (or
# preferably a ggplot2- or lattice-based one), but with lines
# instead of letters. Of course, for this, the levels need to be
# reordered by group means, and it is only guaranteed to
# work for balanced data. Example, with the lines missing:
library(ggplot2)
d=rice
d$trt=reorder(d$trt, -d$ShootDryMass)
ggplot(d, aes(x=trt, y=ShootDryMass)) + geom_boxplot(outlier.colour=NA) 
+
   geom_jitter(col="red", size=3, position=position_jitter(width=.1))

Is there such a function available in R?
#
m?. den 14. 01. 2013 klokka 13.58 (-0500) skreiv Richard M. Heiberger:
Thanks for the suggestion. That?s a very interesting and clever way of
displaying both means and differences. It?s not what I was looking for,
though, as it doesn?t display the actual data.

For the record, here?s the syntax for using MMC on the dataset mentioned
in my original posting:

    l.mmc=mmc(l, linfct = mcp(trt = "Tukey"))
    plot(l.mmc)

(It looks best on data where the group means aren?t very close;
otherwise the labels might overlap.)
#
Den 2013-01-14 19:58 skreiv Richard M. Heiberger:
I have now coded a very quick-and-dirty solution for my line graph.
Example follows. I use the results of the 'cld' function in multcomp
to calculcate which groups are significantly different from each other.

Important note: The LetterMatrix object contains the needed
data, but the rows are not in the same order as the original levels,
and unfortunately the row names do *not* match the level names
(spaces seems to be stripped -- I don't know why). That's the reason
I remove the spaces in the example below.

Note: The levels should preferably be ordered by the group means,
but the function works (modulo any bugs) even if they're not.
Dotted lines are then used to indicate significant differences.

Important note: There are bound to be bugs. Do check the results
before using the resulting graph.


# Calculate non-significance lines for Tukey HSD
tukeyHSDLines=function(l, removeEndLines=TRUE, 
removeSingleGroups=FALSE) {

   # Calculate Tukey HSD values
   library(multcomp)
   l.glht=glht(l, linfct = mcp(trt = "Tukey"))

   # Calculcate a compact letter display (CLD)
   l.cld=cld(l.glht)
   lmat=l.cld$mcletters$LetterMatrix[levels(d$trt),]

   # Calculate the non-significance lines that should be
   # drawn, based on the CLD data
   calc.lines=function(x) {
     r=rle(x)
     lend=cumsum(r$lengths)            # Line end index
     lstart=c(1,lend[-length(lend)]+1) # Line start index
     d2=data.frame(lstart=lstart-.4, lend=lend+.4, nodraw=!r$values)

     if( removeEndLines ) {
       if(d2$nodraw[1]) d2=d2[-1,]               # Remove 'leading' 
dotted lines
       if(d2$nodraw[nrow(d2)]) d2=d2[-nrow(d2),] # Remove 'trailing' 
dotted lines
     }
     d2
   }

   linlist=apply(lmat, 2, calc.lines)
   lindat=do.call(rbind, linlist)
   lindat$y=rep(seq_along(linlist), times=sapply(linlist, nrow))

   if (removeSingleGroups) {
     lindat=subset(lindat, !(y %in% which(tabulate(lindat$y)==1)))
     lindat$y=as.numeric(factor(lindat$y))
   }
lindat
}


# Example:
library(DAAG)
d=rice                                     # The dataset to use
levels(d$trt)=gsub(" ", "", levels(d$trt)) # Remove spaces from level 
names
# d$trt=reorder(d$trt, -d$ShootDryMass)    # Graph looks better with 
reordered factor
l=aov(ShootDryMass ~ trt, data=d)          # Fit a one-way ANOVA

lindat=tukeyHSDLines(l)
lindat$y=150-lindat$y*3

library(ggplot2)
ggplot(d, aes(x=trt, y=ShootDryMass)) + geom_boxplot(outlier.colour=NA) 
+
   geom_jitter(col="red", size=3, position=position_jitter(width=.1)) +
   geom_segment(aes(x=lstart, xend=lend, y=y, yend=y, linetype=nodraw), 
lindat)
#
Thanks for the suggested code! It?s a very nice way of displaying most
aspects of the data and the HSD tests/CIs. The graphical display is
probably to big for inclusion in journal articles, but works well for
displaying the results when working with the data.


Regards,
Karl Ove Hufthammer


ty. den 15. 01. 2013 klokka 06.43 (-0500) skreiv Richard M. Heiberger: