An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130128/2e2dcac0/attachment.pl>
Adding 95% contours around scatterplot points with ggplot2
6 messages · Nathan Miller, Ista Zahn
Hi Nathan, This only fits some of your criteria, but have you looked at ?stat_density2d? Best, Ista
On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I'm imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three "ovals"/95% contours drawn around the points by "level" to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven't been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley's code: dgrid$dens <- dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull <- function(data) data[chull(data$x, data$y), ] hulls <- ddply(data, .(level), find_hull) fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130128/951dd45d/attachment.pl>
Hi Nate, You can make it less busy using the bins argument. This is not documented, except in the examples to stat_contour, but try ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+ geom_point()+ stat_density2d(bins=2) HTH, Ista
On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
Thanks Ista, I have played a bit with stat_density2d as well. It doesn't completely capture what I am looking for and ends up being quite busy at the same time. I'm looking for a way of helping those looking that the figure to see the broad patterns of where in the x/y space the data from different groups are distributed. Using the 95% CI type idea is so that I don't end up arbitrarily drawing circles around each set of points. I appreciate your direction though. Nate On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn <istazahn at gmail.com> wrote:
Hi Nathan, This only fits some of your criteria, but have you looked at ?stat_density2d? Best, Ista On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I'm imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three "ovals"/95% contours drawn around the points by "level" to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven't been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley's code: dgrid$dens <- dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull <- function(data) data[chull(data$x, data$y), ] hulls <- ddply(data, .(level), find_hull) fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130128/82be2ef1/attachment.pl>
Hi Nate, I infer from the stat_density2d documentation that the calculation is carried out by the kde2d function in the MASS package. Refer to ?kde2d for details. Best, Ista
On Mon, Jan 28, 2013 at 3:56 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
Hi Ista, Thanks. That does look pretty nice and I hadn't realized that was possible. Do you know how to extract information regarding those curves? I'd like to be able to report something about what portion of the data they encompass or really any other feature about them in a figure legend. I'll look into stat_density2d and see if I can determine how they are set. Thanks for your help, Nate On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn <istazahn at gmail.com> wrote:
Hi Nate,
You can make it less busy using the bins argument. This is not
documented, except in the examples to stat_contour, but try
ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+
geom_point()+
stat_density2d(bins=2)
HTH,
Ista
On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller <natemiller77 at gmail.com>
wrote:
Thanks Ista, I have played a bit with stat_density2d as well. It doesn't completely capture what I am looking for and ends up being quite busy at the same time. I'm looking for a way of helping those looking that the figure to see the broad patterns of where in the x/y space the data from different groups are distributed. Using the 95% CI type idea is so that I don't end up arbitrarily drawing circles around each set of points. I appreciate your direction though. Nate On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn <istazahn at gmail.com> wrote:
Hi Nathan, This only fits some of your criteria, but have you looked at ?stat_density2d? Best, Ista On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I'm imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three "ovals"/95% contours drawn around the points by "level" to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven't been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley's code: dgrid$dens <- dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull <- function(data) data[chull(data$x, data$y), ] hulls <- ddply(data, .(level), find_hull) fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.