Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Tal Galili
> Sent: Wednesday, January 26, 2011 4:05 PM
> To: r-help at r-project.org
> Subject: [R] boxplot - code for labeling outliers - any suggestions for
> improvements?
>
> Hello all,
> I wrote a small function to add labels for outliers in a boxplot.
> This function will only work on a simple boxplot/formula command (e.g:
> something like boxplot(y~x)).
>
> Code + example follows in this e-mail.
>
> I'd be happy for any suggestions on how to improve this code, for
> example:
>
> - Handle boxplot.matrix (which shouldn't be too hard to do)
> - Handle cases of complex functions (e.g: boxplot(y~a*b))
> - Handle cases where there are many outliers leading to a clutter of
> text
> (to this I have no idea how to systematically solve)
>
>
> Best,
> Tal
> ------------------------------
>
>
> # the function
> boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name)
> {
>
>
> boxplot.outlier.data <- function(xx, y_name)
> {
> y <- xx[,y_name]
> boxplot_range <- range(boxplot.stats(y)$stats)
> ss <- (y < boxplot_range[1]) | (y > boxplot_range[2])
> return(xx[ss,])
> }
>
> require(plyr)
> txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data,
> y_name
> = y_name)", sep = "")
> ourlier_df <- eval(parse(text = txt_to_run))
> # head(ourlier_df)
> txt_to_run <- paste("formula(",y_name,"~", x_name,")")
> formu <- eval(parse(text = txt_to_run))
> boxdata <- boxplot(formu , data = DATA, plot = F)
> boxdata_group_name <- boxdata$names[boxdata$group]
> boxdata_outlier_df <- data.frame(group = boxdata_group_name, y =
> boxdata$out, x = boxdata$group)
> for(i in seq_len(dim(boxdata_outlier_df)[1]))
> {
> ss <- (ourlier_df[,x_name] %in% boxdata_outlier_df[i,]$group) &
> (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y)
> current_label <- ourlier_df[ss,label_name]
> temp_x <- boxdata_outlier_df[i,"x"]
> temp_y <- boxdata_outlier_df[i,"y"]
> text(temp_x, temp_y, current_label,pos=4)
> }
>
> list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df)
> }
>
> # example:
> boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col =
> "bisque")
> boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease",
> "colpos")
>
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com | 972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
> |
> www.r-statistics.com (English)
> -----------------------------------------------------------------------
> -----------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.